Re: [PATCH 0/2] powerpc: Remove support for ppc405/440 Xilinx platforms

2020-04-08 Thread Christophe Leroy




Le 08/04/2020 à 01:32, Benjamin Herrenschmidt a écrit :

On Fri, 2020-04-03 at 15:59 +1100, Michael Ellerman wrote:

Benjamin Herrenschmidt  writes:

On Tue, 2020-03-31 at 16:30 +1100, Michael Ellerman wrote:

I have no attachment to 40x, and I'd certainly be happy to have
less
code in the tree, we struggle to keep even the modern platforms
well
maintained.

At the same time I don't want to render anyone's hardware
obsolete
unnecessarily. But if there's really no one using 40x then we
should
remove it, it could well be broken already.

So I guess post a series to do the removal and we'll see if
anyone
speaks up.


We shouldn't remove 40x completely. Just remove the Xilinx 405
stuff.


Congratulations on becoming the 40x maintainer!


Didn't I give you my last 40x system ? :-) IBM still put 40x cores
inside POWER chips no ?



According to Wikipedia (https://en.wikipedia.org/wiki/PowerPC_400), 405 
cores are used in modern equipments (how modern ?), however 403 has 
never reached the market.


Can we start removing 403 stuff ? That's not a lot, but still.

Does anybody knows anything about this ERRATUM 77 stuff ? Is that still 
an issue with all 405 cores or has this been fixed long time ago and can 
be removed ?


Christophe


Re: [PATCH] cxl: Rework error message for incompatible slots

2020-04-08 Thread Frederic Barrat




Le 08/04/2020 à 04:13, Andrew Donnellan a écrit :

On 7/4/20 9:56 pm, Frederic Barrat wrote:

Improve the error message shown if a capi adapter is plugged on a
capi-incompatible slot directly under the PHB (no intermediate switch).

Fixes: 5632874311db ("cxl: Add support for POWER9 DD2")
Cc: sta...@vger.kernel.org # 4.14+
Signed-off-by: Frederic Barrat 


Seems fine to me, not sure if it needs to go to stable but I suppose 
this could be causing actual confusion out in the field?



Yes it does. The reason for this patch is it was hit by a customer.

  Fred



Reviewed-by: Andrew Donnellan 




Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests

2020-04-08 Thread Anshuman Khandual


On 04/07/2020 09:24 PM, Gerald Schaefer wrote:
> On Sun, 5 Apr 2020 17:58:14 +0530
> Anshuman Khandual  wrote:
> 
> [...]
>>>
>>> Could be fixed like this (the first de-reference is a bit special,
>>> because at that point *ptep does not really point to a large (pmd) entry
>>> yet, it is initially an invalid pte entry, which breaks our huge_ptep_get() 
>>>  
>>
>> There seems to be an inconsistency on s390 platform. Even though it defines
>> a huge_ptep_get() override, it does not subscribe __HAVE_ARCH_HUGE_PTEP_GET
>> which should have forced it fallback on generic huge_ptep_get() but it does
>> not :) Then I realized that __HAVE_ARCH_HUGE_PTEP_GET only makes sense when
>> an arch uses . s390 does not use that and hence gets
>> away with it's own huge_ptep_get() without __HAVE_ARCH_HUGE_PTEP_GET. Sounds
>> confusing ? But I might not have the entire context here.
> 
> Yes, that sounds very confusing. Also a bit ironic, since huge_ptep_get()
> was initially introduced because of s390, and now we don't select
> __HAVE_ARCH_HUGE_PTEP_GET...
> 
> As you realized, I guess this is because we do not use generic hugetlb.h.
> And when __HAVE_ARCH_HUGE_PTEP_GET was introduced with commit 544db7597ad
> ("hugetlb: introduce generic version of huge_ptep_get"), that was probably
> the reason why we did not get our share of __HAVE_ARCH_HUGE_PTEP_GET.

Understood.

> 
> Nothing really wrong with that, but yes, very confusing. Maybe we could
> also select it for s390, even though it wouldn't have any functional
> impact (so far), just for less confusion. Maybe also thinking about
> using the generic hugetlb.h, not sure if the original reasons for not
> doing so would still apply. Now I only need to find the time...

Seems like something worth to explore if we could remove this confusion.

> 
>>
>>> conversion logic. I also added PMD_MASK alignment for RANDOM_ORVALUE,
>>> because we do have some special bits there in our large pmds. It seems
>>> to also work w/o that alignment, but it feels a bit wrong):  
>>
>> Sure, we can accommodate that.
>>
>>>
>>> @@ -731,26 +731,26 @@ static void __init hugetlb_advanced_test
>>>   unsigned long vaddr, pgprot_t 
>>> prot)
>>>  {
>>> struct page *page = pfn_to_page(pfn);
>>> -   pte_t pte = READ_ONCE(*ptep);
>>> +   pte_t pte;
>>>
>>> -   pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
>>> +   pte = pte_mkhuge(mk_pte_phys(RANDOM_ORVALUE & PMD_MASK, prot));  
>>
>> So that keeps the existing value in 'ptep' pointer at bay and instead
>> construct a PTE from scratch. I would rather have READ_ONCE(*ptep) at
>> least provide the seed that can be ORed with RANDOM_ORVALUE before
>> being masked with PMD_MASK. Do you see any problem ?
> 
> Yes, unfortunately. The problem is that the resulting pte is not marked
> as present. The conversion pte -> (huge) pmd, which is done in
> set_huge_pte_at() for s390, will establish an empty pmd for non-present
> ptes, all the RANDOM_ORVALUE stuff is lost. And a subsequent
> huge_ptep_get() will not result in the same original pte value. If you

Ohh.

> want to preserve and check the RANDOM_ORVALUE, it has to be a present
> pte, hence the mk_pte(_phys).

Understood and mk_pte() is also available on all platforms.

> 
>>
>> Some thing like this instead.
>>
>> pte_t pte = READ_ONCE(*ptep);
>> pte = pte_mkhuge(__pte((pte_val(pte) | RANDOM_ORVALUE) & PMD_MASK));
>>
>> We cannot use mk_pte_phys() as it is defined only on some platforms
>> without any generic fallback for others.
> 
> Oh, didn't know that, sorry. What about using mk_pte() instead, at least
> it would result in a present pte:
> 
> pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), prot));

Lets use mk_pte() here but can we do this instead

paddr = (__pfn_to_phys(pfn) | RANDOM_ORVALUE) & PMD_MASK;
pte = pte_mkhuge(mk_pte(phys_to_page(paddr), prot));

> 
> And if you also want to do some with the existing value, which seems
> to be an empty pte, then maybe just check if writing and reading that
> value with set_huge_pte_at() / huge_ptep_get() returns the same,
> i.e. initially w/o RANDOM_ORVALUE.
> 
> So, in combination, like this (BTW, why is the barrier() needed, it
> is not used for the other set_huge_pte_at() calls later?):

Ahh missed, will add them. Earlier we faced problem without it after
set_pte_at() for a test on powerpc (64) platform. Hence just added it
here to be extra careful.

> 
> @@ -733,24 +733,28 @@ static void __init hugetlb_advanced_test
> struct page *page = pfn_to_page(pfn);
> pte_t pte = READ_ONCE(*ptep);
>  
> -   pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
> +   set_huge_pte_at(mm, vaddr, ptep, pte);
> +   WARN_ON(!pte_same(pte, huge_ptep_get(ptep)));
> +
> +   pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), 
> prot));
> set_huge_pte_at(mm, vaddr, ptep, pte);
> barrier();
> WARN_ON(!pte_same(pte, huge_ptep_get(ptep)));
> 
> Th

Re: [PATCH kernel v2 0/7] powerpc/powenv/ioda: Allow huge DMA window at 4GB

2020-04-08 Thread Alexey Kardashevskiy



On 23/03/2020 18:53, Alexey Kardashevskiy wrote:
> Here is an attempt to support bigger DMA space for devices
> supporting DMA masks less than 59 bits (GPUs come into mind
> first). POWER9 PHBs have an option to map 2 windows at 0
> and select a windows based on DMA address being below or above
> 4GB.
> 
> This adds the "iommu=iommu_bypass" kernel parameter and
> supports VFIO+pseries machine - current this requires telling
> upstream+unmodified QEMU about this via
> -global spapr-pci-host-bridge.dma64_win_addr=0x1
> or per-phb property. 4/4 advertises the new option but
> there is no automation around it in QEMU (should it be?).
> 
> For now it is either 1<<59 or 4GB mode; dynamic switching is
> not supported (could be via sysfs).
> 
> This is a rebased version of
> https://lore.kernel.org/kvm/20191202015953.127902-1-...@ozlabs.ru/
> 
> The main change since v1 is that now it is 7 patches with
> clearer separation of steps.
> 
> 
> This is based on 6c90b86a745a "Merge tag 'mmc-v5.6-rc6' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc"
> 
> Please comment. Thanks.

Ping?


> 
> 
> 
> Alexey Kardashevskiy (7):
>   powerpc/powernv/ioda: Move TCE bypass base to PE
>   powerpc/powernv/ioda: Rework for huge DMA window at 4GB
>   powerpc/powernv/ioda: Allow smaller TCE table levels
>   powerpc/powernv/phb4: Use IOMMU instead of bypassing
>   powerpc/iommu: Add a window number to
> iommu_table_group_ops::get_table_size
>   powerpc/powernv/phb4: Add 4GB IOMMU bypass mode
>   vfio/spapr_tce: Advertise and allow a huge DMA windows at 4GB
> 
>  arch/powerpc/include/asm/iommu.h  |   3 +
>  arch/powerpc/include/asm/opal-api.h   |   9 +-
>  arch/powerpc/include/asm/opal.h   |   2 +
>  arch/powerpc/platforms/powernv/pci.h  |   4 +-
>  include/uapi/linux/vfio.h |   2 +
>  arch/powerpc/platforms/powernv/npu-dma.c  |   1 +
>  arch/powerpc/platforms/powernv/opal-call.c|   2 +
>  arch/powerpc/platforms/powernv/pci-ioda-tce.c |   4 +-
>  arch/powerpc/platforms/powernv/pci-ioda.c | 234 ++
>  drivers/vfio/vfio_iommu_spapr_tce.c   |  17 +-
>  10 files changed, 213 insertions(+), 65 deletions(-)
> 

-- 
Alexey


Re: [RFC] Support stop state version quirk and firmware enabled stop

2020-04-08 Thread Gautham R Shenoy
Hi Pratik,

On Wed, Mar 04, 2020 at 09:26:48PM +0530, Pratik Rajesh Sampat wrote:
> A concept patch in Skiboot to illustrate the case wherein handling of
> stop states for different DD versions of a CPU can be achieved by a
> simple modification in the list of cpu_features.
> As an example idle-stop1 is defined which uses P9_CPU_DD1 to define the
> cpu feature.
> 
> Along with that, an implementation is being worked upon the LE OPAL
> series which helps OPAL handle the stop state entry and exit.
> 
> This patch advertises this capability of the firmware which can be
> availed if the quirk-version-setting is not cognizable.
> 
> The firmware-enabled stop is being worked by Abhishek Goel
>  building upon the LE OPAL series.
> 
> Signed-off-by: Pratik Rajesh Sampat 
> ---
>  core/cpufeatures.c | 22 ++
>  1 file changed, 22 insertions(+)
> 
> diff --git a/core/cpufeatures.c b/core/cpufeatures.c
> index ec30c975..b9875e7b 100644
> --- a/core/cpufeatures.c
> +++ b/core/cpufeatures.c
> @@ -510,6 +510,25 @@ static const struct cpu_feature cpu_features_table[] = {
>   -1, -1, -1,
>   NULL, },
> 
> + /*
> +  * QUIRK for ISAv3.0B stop idle instructions and registers
> +  * Helps us determine if there are any quirks
> +  * XXX: Same of idle-stop
> +  */
> + { "idle-stop-v1",
> + CPU_P9_DD1,
> + ISA_V3_0B, USABLE_HV|USABLE_OS,
> + HV_CUSTOM, OS_CUSTOM,
> + -1, -1, -1,
> + NULL, },


So, at this point, we don't need any such quirk for any of the DD
version right ? This is to demonstrate that if say P9_DD1 had a quirk
w.r.t stop-state handling, then this is how we would advertise it to
the kernel.

> +
> + { "firmware-stop-supported",
> + CPU_P9,
> + ISA_V3_0B, USABLE_HV|USABLE_OS,
> + HV_CUSTOM, OS_CUSTOM,
> + -1, -1, -1,
> + NULL, },
> +


I suppose this is for the opal-cpuidle driver support posted here:
https://lists.ozlabs.org/pipermail/skiboot/2020-April/016726.html

>   /*
>* ISAv3.0B Hypervisor Virtualization Interrupt
>* Also associated system registers, LPCR EE, HEIC, HVICE,
> @@ -883,6 +902,9 @@ static void add_cpufeatures(struct dt_node *cpus,
>   const struct cpu_feature *f = &cpu_features_table[i];
> 
>   if (f->cpus_supported & cpu_feature_cpu) {
> + if (!strcmp(f->name, "firmware-stop-supported") &&
> + HAVE_BIG_ENDIAN)
> + continue;

In OPAL do we have an macro defining BIG_ENDIAN ? If yes, you could
wrap the "firmware-stop-supported" in cpu_features_table[] within
#ifndef BIG_ENDIAN. That way you won't need a special case here.



>   DBG("  '%s'\n", f->name);
>   add_cpu_feature_nodeps(features, f);
>   }
> -- 
> 2.24.1
> 

--
Thanks and Regards
gautham.


[PATCH] i2c: powermac: Simplify reading the "reg" and "i2c-address" property

2020-04-08 Thread Aishwarya R
Use of_property_read_u32 to read the "reg" and "i2c-address" property
instead of using of_get_property to check the return values.

Signed-off-by: Aishwarya R 
---
 drivers/i2c/busses/i2c-powermac.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/i2c/busses/i2c-powermac.c 
b/drivers/i2c/busses/i2c-powermac.c
index d565714c1f13..81506c2dab65 100644
--- a/drivers/i2c/busses/i2c-powermac.c
+++ b/drivers/i2c/busses/i2c-powermac.c
@@ -207,18 +207,18 @@ static u32 i2c_powermac_get_addr(struct i2c_adapter *adap,
   struct pmac_i2c_bus *bus,
   struct device_node *node)
 {
-   const __be32 *prop;
-   int len;
+   u32 prop;
+   int ret;
 
/* First check for valid "reg" */
-   prop = of_get_property(node, "reg", &len);
-   if (prop && (len >= sizeof(int)))
-   return (be32_to_cpup(prop) & 0xff) >> 1;
+   ret = of_property_read_u32(node, "reg", &prop);
+   if (ret == 0)
+   return (prop & 0xff) >> 1;
 
/* Then check old-style "i2c-address" */
-   prop = of_get_property(node, "i2c-address", &len);
-   if (prop && (len >= sizeof(int)))
-   return (be32_to_cpup(prop) & 0xff) >> 1;
+   ret = of_property_read_u32(node, "i2c-address", &prop);
+   if (ret == 0)
+   return (prop & 0xff) >> 1;
 
/* Now handle some devices with missing "reg" properties */
if (of_node_name_eq(node, "cereal"))
-- 
2.17.1



Re: [RFC 1/3] Interface for an idle-stop dependency structure

2020-04-08 Thread Gautham R Shenoy
Hi Pratik,

On Wed, Mar 04, 2020 at 09:31:21PM +0530, Pratik Rajesh Sampat wrote:
> Design patch to introduce the idea of having a dependency structure for
> idle-stop. The structure encapsulates the following:
> 1. Bitmask for version of idle-stop
> 2. Bitmask for propterties like ENABLE/DISABLE
> 3. Function pointer which helps handle how the stop must be invoked
> 
> The commit lays a foundation for other idle-stop versions to be added
> and handled cleanly based on their specified requirments.
> Currently it handles the existing "idle-stop" version by setting the
> discovery bits and the function pointer.

So, if this patch is applied, and we are running with an OPAL that
doesn't publish the "idle-stop" dt-cpu-feature, then the goal is to
not enable any stop states. Is this correct ?


> 
> Signed-off-by: Pratik Rajesh Sampat 
> ---
>  arch/powerpc/include/asm/processor.h  | 17 +
>  arch/powerpc/kernel/dt_cpu_ftrs.c |  5 +
>  arch/powerpc/platforms/powernv/idle.c | 17 +
>  drivers/cpuidle/cpuidle-powernv.c |  3 ++-
>  4 files changed, 37 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/processor.h 
> b/arch/powerpc/include/asm/processor.h
> index eedcbfb9a6ff..da59f01a5c09 100644
> --- a/arch/powerpc/include/asm/processor.h
> +++ b/arch/powerpc/include/asm/processor.h
> @@ -429,6 +429,23 @@ extern void power4_idle_nap(void);
>  extern unsigned long cpuidle_disable;
>  enum idle_boot_override {IDLE_NO_OVERRIDE = 0, IDLE_POWERSAVE_OFF};
> 
> +#define STOP_ENABLE  0x0001
> +
> +#define STOP_VERSION_P9   0x1
> +
> +/*
> + * Classify the dependencies of the stop states
> + * @idle_stop: function handler to handle the quirk stop version
> + * @cpuidle_prop: Signify support for stop states through kernel and/or 
> firmware
> + * @stop_version: Classify quirk versions for stop states
> + */
> +typedef struct {
> + unsigned long (*idle_stop)(unsigned long, bool);
> + uint8_t cpuidle_prop;
> + uint8_t stop_version;

Why do we need both cpuidle_prop and stop_version ? 


> +}stop_deps_t;
> +extern stop_deps_t stop_dep;
> +
>  extern int powersave_nap;/* set if nap mode can be used in idle loop */
> 
>  extern void power7_idle_type(unsigned long type);
> diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
> b/arch/powerpc/kernel/dt_cpu_ftrs.c
> index 182b4047c1ef..db1a525e090d 100644
> --- a/arch/powerpc/kernel/dt_cpu_ftrs.c
> +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
> @@ -292,6 +292,8 @@ static int __init feat_enable_idle_stop(struct 
> dt_cpu_feature *f)
>   lpcr |=  LPCR_PECE1;
>   lpcr |=  LPCR_PECE2;
>   mtspr(SPRN_LPCR, lpcr);
> + stop_dep.cpuidle_prop |= STOP_ENABLE;
> + stop_dep.stop_version = STOP_VERSION_P9;

Doesn't stop_version != 0 imply (stop_dep.cpuidle_prop & STOP_ENABLE)?

> 
>   return 1;
>  }
> @@ -657,6 +659,9 @@ static void __init cpufeatures_setup_start(u32 isa)
>   }
>  }
> 
> +stop_deps_t stop_dep = {NULL, 0x0, 0x0};
> +EXPORT_SYMBOL(stop_dep);
> +
>  static bool __init cpufeatures_process_feature(struct dt_cpu_feature *f)
>  {
>   const struct dt_cpu_feature_match *m;
> diff --git a/arch/powerpc/platforms/powernv/idle.c 
> b/arch/powerpc/platforms/powernv/idle.c
> index 78599bca66c2..c32cdc37acf4 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -812,7 +812,7 @@ static unsigned long power9_offline_stop(unsigned long 
> psscr)
> 
>  #ifndef CONFIG_KVM_BOOK3S_HV_POSSIBLE
>   __ppc64_runlatch_off();
> - srr1 = power9_idle_stop(psscr, true);
> + srr1 = stop_dep.idle_stop(psscr, true);
>   __ppc64_runlatch_on();
>  #else
>   /*
> @@ -828,7 +828,7 @@ static unsigned long power9_offline_stop(unsigned long 
> psscr)
>   local_paca->kvm_hstate.hwthread_state = KVM_HWTHREAD_IN_IDLE;
> 
>   __ppc64_runlatch_off();
> - srr1 = power9_idle_stop(psscr, false);
> + srr1 = stop_dep.idle_stop(psscr, true);
>   __ppc64_runlatch_on();
> 
>   local_paca->kvm_hstate.hwthread_state = KVM_HWTHREAD_IN_KERNEL;
> @@ -856,7 +856,7 @@ void power9_idle_type(unsigned long stop_psscr_val,
>   psscr = (psscr & ~stop_psscr_mask) | stop_psscr_val;
> 
>   __ppc64_runlatch_off();
> - srr1 = power9_idle_stop(psscr, true);
> + srr1 = stop_dep.idle_stop(psscr, true);
>   __ppc64_runlatch_on();
>

There is one other place in arch/powerpc/kvm/book3s_hv_rmhandlers.S
where call isa300_idle_stop_mayloss (this is kvm_nap_sequence).

So, if stop states are not supported, then, KVM subsystem should know
about it. Some KVM configurations depend on putting the secondary
threads of the core offline into an idle state whose wakeup is from
0x100 vector. Your patch doesn't address that part.

>   fini_irq_for_idle_irqsoff();
> @@ -1353,8 +1353,17 @@ static int __init pnv_init_idle_states(void)
>   nr_pnv_idle_states = 0;
>   supported_cpuidle_states = 0;
> 
> - if (cpuidle_disable != ID

Re: [RFC 3/3] Introduce capability for firmware-enabled-stop

2020-04-08 Thread Gautham R Shenoy
Hi Pratik,

On Wed, Mar 04, 2020 at 09:31:23PM +0530, Pratik Rajesh Sampat wrote:
> Design patch that introduces the capability for firmware to handle the
> stop states instead. A bit is set based on the discovery of the feature
> and correspondingly also the responsibility to handle the stop states.
> 
> The commit does not contain calling into the firmware to utilize
> firmware enabled stop.
> 
> Signed-off-by: Pratik Rajesh Sampat 
> ---
>  arch/powerpc/include/asm/processor.h | 1 +
>  arch/powerpc/kernel/dt_cpu_ftrs.c| 9 +
>  2 files changed, 10 insertions(+)
> 
> diff --git a/arch/powerpc/include/asm/processor.h 
> b/arch/powerpc/include/asm/processor.h
> index 277dbabafd02..978fab35d133 100644
> --- a/arch/powerpc/include/asm/processor.h
> +++ b/arch/powerpc/include/asm/processor.h
> @@ -430,6 +430,7 @@ extern unsigned long cpuidle_disable;
>  enum idle_boot_override {IDLE_NO_OVERRIDE = 0, IDLE_POWERSAVE_OFF};
> 
>  #define STOP_ENABLE  0x0001
> +#define FIRMWARE_STOP_ENABLE 0x0010


This could be made a bit in the "version" variable.

> 
>  #define STOP_VERSION_P9   0x1
>  #define STOP_VERSION_P9_V10x2
> diff --git a/arch/powerpc/kernel/dt_cpu_ftrs.c 
> b/arch/powerpc/kernel/dt_cpu_ftrs.c
> index 63e30aa49356..e00f8afabc46 100644
> --- a/arch/powerpc/kernel/dt_cpu_ftrs.c
> +++ b/arch/powerpc/kernel/dt_cpu_ftrs.c
> @@ -313,6 +313,14 @@ static int __init feat_enable_idle_stop_quirk(struct 
> dt_cpu_feature *f)
> 
>   return 1;
>  }
> +
> +static int __init feat_enable_firmware_stop(struct dt_cpu_feature *f)
> +{
> + stop_dep.cpuidle_prop |= FIRMWARE_STOP_ENABLE;

stop_dep.cpuidle_version |= FIRMWARE_STOP_V1; or some such
variant.


> +
> + return 1;
> +}
> +
>  static int __init feat_enable_mmu_hash(struct dt_cpu_feature *f)
>  {
>   u64 lpcr;
> @@ -608,6 +616,7 @@ static struct dt_cpu_feature_match __initdata
>   {"alignment-interrupt-dsisr", feat_enable_align_dsisr, 0},
>   {"idle-stop", feat_enable_idle_stop, 0},
>   {"idle-stop-v1", feat_enable_idle_stop_quirk, 0},
> + {"firmware-stop-supported", feat_enable_firmware_stop, 0},
>   {"machine-check-power8", feat_enable_mce_power8, 0},
>   {"performance-monitor-power8", feat_enable_pmu_power8, 0},
>   {"data-stream-control-register", feat_enable_dscr, CPU_FTR_DSCR},
> -- 
> 2.24.1
> 


Re: [RFC PATCH 2/3] powerpc/lib: Initialize a temporary mm for code patching

2020-04-08 Thread Christophe Leroy




Le 31/03/2020 à 05:19, Christopher M Riedl a écrit :

On March 24, 2020 11:10 AM Christophe Leroy  wrote:

  
Le 23/03/2020 à 05:52, Christopher M. Riedl a écrit :

When code patching a STRICT_KERNEL_RWX kernel the page containing the
address to be patched is temporarily mapped with permissive memory
protections. Currently, a per-cpu vmalloc patch area is used for this
purpose. While the patch area is per-cpu, the temporary page mapping is
inserted into the kernel page tables for the duration of the patching.
The mapping is exposed to CPUs other than the patching CPU - this is
undesirable from a hardening perspective.

Use the `poking_init` init hook to prepare a temporary mm and patching
address. Initialize the temporary mm by copying the init mm. Choose a
randomized patching address inside the temporary mm userspace address
portion. The next patch uses the temporary mm and patching address for
code patching.

Based on x86 implementation:

commit 4fc19708b165
("x86/alternatives: Initialize temporary mm for patching")

Signed-off-by: Christopher M. Riedl 
---
   arch/powerpc/lib/code-patching.c | 26 ++
   1 file changed, 26 insertions(+)

diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
index 3345f039a876..18b88ecfc5a8 100644
--- a/arch/powerpc/lib/code-patching.c
+++ b/arch/powerpc/lib/code-patching.c
@@ -11,6 +11,8 @@
   #include 
   #include 
   #include 
+#include 
+#include 
   
   #include 

   #include 
@@ -39,6 +41,30 @@ int raw_patch_instruction(unsigned int *addr, unsigned int 
instr)
   }
   
   #ifdef CONFIG_STRICT_KERNEL_RWX

+
+__ro_after_init struct mm_struct *patching_mm;
+__ro_after_init unsigned long patching_addr;


Can we make those those static ?



Yes, makes sense to me.


+
+void __init poking_init(void)
+{
+   spinlock_t *ptl; /* for protecting pte table */
+   pte_t *ptep;
+
+   patching_mm = copy_init_mm();
+   BUG_ON(!patching_mm);


Does it needs to be a BUG_ON() ? Can't we fail gracefully with just a
WARN_ON ?



I'm not sure what failing gracefully means here? The main reason this could
fail is if there is not enough memory to allocate the patching_mm. The
previous implementation had this justification for BUG_ON():


But the system can continue running just fine after this failure.
Only the things that make use of code patching will fail (ftrace, kgdb, ...)

Checkpatch tells: "Avoid crashing the kernel - try using WARN_ON & 
recovery code rather than BUG() or BUG_ON()"


All vital code patching has already been done previously, so I think a 
WARN_ON() should be enough, plus returning non 0 to indicate that the 
late_initcall failed.





/*
  * Run as a late init call. This allows all the boot time patching to be done
  * simply by patching the code, and then we're called here prior to
  * mark_rodata_ro(), which happens after all init calls are run. Although
  * BUG_ON() is rude, in this case it should only happen if ENOMEM, and we judge
  * it as being preferable to a kernel that will crash later when someone tries
  * to use patch_instruction().
  */
static int __init setup_text_poke_area(void)
{
 BUG_ON(!cpuhp_setup_state(CPUHP_AP_ONLINE_DYN,
 "powerpc/text_poke:online", text_area_cpu_up,
 text_area_cpu_down));

 return 0;
}
late_initcall(setup_text_poke_area);

I think the BUG_ON() is appropriate even if only to adhere to the previous
judgement call. I can add a similar comment explaining the reasoning if
that helps.


+
+   /*
+* In hash we cannot go above DEFAULT_MAP_WINDOW easily.
+* XXX: Do we want additional bits of entropy for radix?
+*/
+   patching_addr = (get_random_long() & PAGE_MASK) %
+   (DEFAULT_MAP_WINDOW - PAGE_SIZE);
+
+   ptep = get_locked_pte(patching_mm, patching_addr, &ptl);
+   BUG_ON(!ptep);


Same here, can we fail gracefully instead ?



Same reasoning as above.


Here as well, a WARN_ON() should be enough, the system will continue 
running after that.





+   pte_unmap_unlock(ptep, ptl);
+}
+
   static DEFINE_PER_CPU(struct vm_struct *, text_poke_area);
   
   static int text_area_cpu_up(unsigned int cpu)




Christophe


Christophe


Re: [RFC] cpuidle/powernv : Support for pre-entry and post exit of stop state in firmware

2020-04-08 Thread Gautham R Shenoy
Hi Abhishek,

On Fri, Apr 03, 2020 at 04:27:01AM -0500, Abhishek Goel wrote:
> This patch provides kernel framework fro opal support of save restore
> of sprs in idle stop loop. Opal support for stop states is needed to
> selectively enable stop states or to introduce a quirk quickly in case
> a buggy stop state is present.
> 
> We make a opal call from kernel if firmware-stop-support for stop
> states is present and enabled. All the quirks for pre-entry of stop
> state is handled inside opal. A call from opal is made into kernel
> where we execute stop afer saving of NVGPRs.
> After waking up from 0x100 vector in kernel, we enter back into opal.
> All the quirks in post exit path, if any, are then handled in opal,
> from where we return successfully back to kernel.
> For deep stop states in which additional SPRs are lost, saving and
> restoration will be done in OPAL.
> 
> This idea was first proposed by Nick here:
> https://patchwork.ozlabs.org/patch/1208159/
> 
> The corresponding skiboot patch for this kernel patch is here:
> https://patchwork.ozlabs.org/patch/1265959/
> 
> When we callback from OPAL into kernel, r13 is clobbered. So, to
> access PACA we need to restore it from HSPRGO. In future we can
> handle this into OPAL as in here:
> https://patchwork.ozlabs.org/patch/1245275/
> 
> Signed-off-by: Abhishek Goel 
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/include/asm/opal-api.h|  8 -
>  arch/powerpc/include/asm/opal.h|  3 ++
>  arch/powerpc/kernel/idle_book3s.S  |  5 +++
>  arch/powerpc/platforms/powernv/idle.c  | 37 ++
>  arch/powerpc/platforms/powernv/opal-call.c |  2 ++
>  5 files changed, 54 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/opal-api.h 
> b/arch/powerpc/include/asm/opal-api.h
> index c1f25a760eb1..a2c782c99c9e 100644
> --- a/arch/powerpc/include/asm/opal-api.h
> +++ b/arch/powerpc/include/asm/opal-api.h
> @@ -214,7 +214,9 @@
>  #define OPAL_SECVAR_GET  176
>  #define OPAL_SECVAR_GET_NEXT 177
>  #define OPAL_SECVAR_ENQUEUE_UPDATE   178
> -#define OPAL_LAST178
> +#define OPAL_REGISTER_OS_OPS 181
> +#define OPAL_CPU_IDLE182
> +#define OPAL_LAST182
> 
>  #define QUIESCE_HOLD 1 /* Spin all calls at entry */
>  #define QUIESCE_REJECT   2 /* Fail all calls with 
> OPAL_BUSY */
> @@ -1181,6 +1183,10 @@ struct opal_mpipl_fadump {
>   struct  opal_mpipl_region region[];
>  } __packed;
> 
> +struct opal_os_ops {
> + __be64 os_idle_stop;
> +};
> +
>  #endif /* __ASSEMBLY__ */
> 
>  #endif /* __OPAL_API_H */
> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
> index 9986ac34b8e2..3c340bc4df8e 100644
> --- a/arch/powerpc/include/asm/opal.h
> +++ b/arch/powerpc/include/asm/opal.h
> @@ -400,6 +400,9 @@ void opal_powercap_init(void);
>  void opal_psr_init(void);
>  void opal_sensor_groups_init(void);
> 
> +extern int64_t opal_register_os_ops(struct opal_os_ops *os_ops);
> +extern int64_t opal_cpu_idle(__be64 srr1_addr, uint64_t psscr);
> +
>  #endif /* __ASSEMBLY__ */
> 
>  #endif /* _ASM_POWERPC_OPAL_H */
> diff --git a/arch/powerpc/kernel/idle_book3s.S 
> b/arch/powerpc/kernel/idle_book3s.S
> index 22f249b6f58d..8d287d1d06c0 100644
> --- a/arch/powerpc/kernel/idle_book3s.S
> +++ b/arch/powerpc/kernel/idle_book3s.S
> @@ -49,6 +49,8 @@ _GLOBAL(isa300_idle_stop_noloss)
>   */
>  _GLOBAL(isa300_idle_stop_mayloss)
>   mtspr   SPRN_PSSCR,r3
> + mr  r6, r13
> + mfspr   r13, SPRN_HSPRG0
>   std r1,PACAR1(r13)
>   mflrr4
>   mfcrr5
> @@ -74,6 +76,7 @@ _GLOBAL(isa300_idle_stop_mayloss)
>   std r31,-8*18(r1)
>   std r4,-8*19(r1)
>   std r5,-8*20(r1)
> + std r6,-8*21(r1)
>   /* 168 bytes */
>   PPC_STOP
>   b   .   /* catch bugs */
> @@ -91,8 +94,10 @@ _GLOBAL(idle_return_gpr_loss)
>   ld  r1,PACAR1(r13)
>   ld  r4,-8*19(r1)
>   ld  r5,-8*20(r1)
> + ld  r6,-8*21(r1)
>   mtlrr4
>   mtcrr5
> + mr  r13,r6
>   /*
>* KVM nap requires r2 to be saved, rather than just restoring it
>* from PACATOC. This could be avoided for that less common case
> diff --git a/arch/powerpc/platforms/powernv/idle.c 
> b/arch/powerpc/platforms/powernv/idle.c
> index 78599bca66c2..1841027b25c5 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -35,6 +35,7 @@
>  static u32 supported_cpuidle_states;
>  struct pnv_idle_states_t *pnv_idle_states;
>  int nr_pnv_idle_states;
> +static bool firmware_stop_supported;
> 
>  /*
>   * The default stop state that will be used by ppc_md.power_save
> @@ -602,6 +603,25 @@ struct p9_sprs {
>   u64 uamor;
>  };
> 
> +/*
> + * This function is called from OPAL if fi

[PATCH] powerpc/powernv: Add a print indicating when an IODA PE is released

2020-04-08 Thread Oliver O'Halloran
Quite useful to know in some cases.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 3d81c01..82e5098 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -3475,6 +3475,8 @@ static void pnv_ioda_release_pe(struct pnv_ioda_pe *pe)
struct pnv_phb *phb = pe->phb;
struct pnv_ioda_pe *slave, *tmp;
 
+   pe_info(pe, "Releasing PE\n");
+
mutex_lock(&phb->ioda.pe_list_mutex);
list_del(&pe->list);
mutex_unlock(&phb->ioda.pe_list_mutex);
-- 
2.9.5



Re: [PATCH v2] powerpc/ptrace: Do not return ENOSYS if invalid syscall

2020-04-08 Thread Michael Ellerman
Hi Cascardo,

Thanks for following-up on this.

Unfortunately I don't think I can merge this fix.

Thadeu Lima de Souza Cascardo  writes:
> If a tracer sets the syscall number to an invalid one, allow the return
> value set by the tracer to be returned the tracee.

The problem is this patch not only *allows* the tracer to set the return
value, but it also *requires* the tracer to set the return value. That
would be a change to the ABI.

Currently if a tracer sets the syscall number to -1, that's all they
need to do, and the kernel will make sure ENOSYS is returned to the
tracee.

With this patch applied the tracer can set the syscall to -1 but they
also must set the return value explicitly. Otherwise the syscall will
just return with whatever value happens to be in r3.

I confirmed this patch breaks the strace testsuite:

  # cd strace/tests/
  # bash qual_inject-retval.test
  ../../strace: Failed to tamper with process 13301: unexpectedly got no error 
(return value 0x10001090, error 0)
  expected retval 0, got retval 268439696
  chdir("..") = 268439696 (INJECTED)
  +++ exited with 1 +++
  qual_inject-retval.test: failed test: ../../strace -a12 -echdir 
-einject=chdir:retval=0 ../qual_inject-retval 0 failed with code 1

The return value 0x10001090 is the address of the ".." string passed to
the syscall.

> The test for NR_syscalls is already at entry_64.S, and it's at
> do_syscall_trace_enter only to skip audit and trace.
>
> After this, two failures from seccomp_bpf selftests complete just fine,
> as the failing test was using ptrace to change the syscall to return an
> error or a fake value, but were failing as it was always returning
> -ENOSYS.

This test wants to change the syscall number and the return value, and
do both from the syscall enter hook.

We don't support that, because we have no way of knowing if the tracer
set the return value, so we always return ENOSYS. Our ptrace ABI has
been that way forever.

We could possibly do something like compare r3 and orig_gpr3 and assume
that if they're different then the tracer has set r3 to the return
value. But I worry that will break something and/or just be very subtle
and bug prone.

I think the right way to fix it is for the test case to change the
return value from the syscall exit hook. That will work on all existing
kernels AFAIK. It's also what strace does.

cheers


> diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
> index 25c0424e8868..557ae4bc2331 100644
> --- a/arch/powerpc/kernel/ptrace.c
> +++ b/arch/powerpc/kernel/ptrace.c
> @@ -3314,7 +3314,7 @@ long do_syscall_trace_enter(struct pt_regs *regs)
> 
>   /* Avoid trace and audit when syscall is invalid. */
>   if (regs->gpr[0] >= NR_syscalls)
> - goto skip;
> + return regs->gpr[0];
> 
>   if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
>   trace_sys_enter(regs, regs->gpr[0]);


Re: [PATCH v2 0/2] Don't generate thousands of new warnings when building docs

2020-04-08 Thread Mauro Carvalho Chehab
Em Tue, 07 Apr 2020 13:46:23 +1000
Michael Ellerman  escreveu:

> Mauro Carvalho Chehab  writes:
> > This small series address a regression caused by a new patch at
> > docs-next (and at linux-next).
> >

...

> > This solves almost all problems we have. Still, there are a few places
> > where we have two chapters at the same document with the
> > same name. The first patch addresses this problem.  
> 
> I'm still seeing a lot of warnings. Am I doing something wrong?
> 
> cheers
> 
> /linux/Documentation/powerpc/cxl.rst:406: WARNING: duplicate label 
> powerpc/cxl:open, other instance in /linux/Documentation/powerpc/cxl.rst
...
> /linux/Documentation/powerpc/syscall64-abi.rst:86: WARNING: duplicate label 
> powerpc/syscall64-abi:parameters and return value, other instance in 
> /linux/Documentation/powerpc/syscall64-abi.rst
...
> /linux/Documentation/powerpc/ultravisor.rst:339: WARNING: duplicate label 
> powerpc/ultravisor:syntax, other instance in 
> /linux/Documentation/powerpc/ultravisor.rst
...

I can't reproduce your issue here at linux-next (+ my pending doc patches).

So, I can only provide you some hints.

If you see the logs you posted, all of them are related to duplicated
labels inside the same file.

-

The new Sphinx module we're using (sphinx.ext.autosectionlabel) generates
references for two levels, within the same document file (after this patch).


Looking at the first document (at linux-next version), it has:

1) A first level document title:

   Coherent Accelerator Interface (CXL)

2) Several second level titles:

   Introduction
   Hardware overview
   AFU Modes
   MMIO space
   Interrupts
   Work Element Descriptor (WED)
   User API
   Sysfs Class
   Udev rules

Right now, there's no duplication, but if someone adds, for example, 
another first-level or second-level title called "Interrupts", then 
the file will produce a duplicated label and Sphinx will warn.

The same would happen if someone adds another title (either first
level or second level) called "Coherent Accelerator Interface (CXL)",
as this will conflict with the document title.

-

Now, if the title "Coherent Accelerator Interface (CXL)" got removed,
then "Introduction".."Udev rules" will become first level titles.

Then, the sections at the "User API": "open", "ioctl"... will become
second level titles and it will produce lots of warnings.

-

That's said, IMHO, this document needs section titles for the two
sections under "User API". Adding it would allow removing the document
title. See enclosed.

Thanks,
Mauro

powerpc: docs: cxl.rst: mark two section titles as such

The User API chapter contains two sub-chapters. Mark them as
such.

Signed-off-by: Mauro Carvalho Chehab 


diff --git a/Documentation/powerpc/cxl.rst b/Documentation/powerpc/cxl.rst
index 920546d81326..d2d77057610e 100644
--- a/Documentation/powerpc/cxl.rst
+++ b/Documentation/powerpc/cxl.rst
@@ -133,6 +133,7 @@ User API
 
 
 1. AFU character devices
+
 
 For AFUs operating in AFU directed mode, two character device
 files will be created. /dev/cxl/afu0.0m will correspond to a
@@ -395,6 +396,7 @@ read
 
 
 2. Card character device (powerVM guest only)
+^
 
 In a powerVM guest, an extra character device is created for the
 card. The device is only used to write (flash) a new image on the



decruft the vmalloc API

2020-04-08 Thread Christoph Hellwig
Hi all,

Peter noticed that with some dumb luck you can toast the kernel address
space with exported vmalloc symbols.

I used this as an opportunity to decruft the vmalloc.c API and make it
much more systematic.  This also removes any chance to create vmalloc
mappings outside the designated areas or using executable permissions
from modules.  Besides that it removes more than 300 lines of code.

A git tree is also available here:

git://git.infradead.org/users/hch/misc.git sanitize-vmalloc-api

Gitweb:


http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/sanitize-vmalloc-api


[PATCH 01/28] x86/hyperv: use vmalloc_exec for the hypercall page

2020-04-08 Thread Christoph Hellwig
Use the designated helper for allocating executable kernel memory, and
remove the now unused PAGE_KERNEL_RX define.

Signed-off-by: Christoph Hellwig 
---
 arch/x86/hyperv/hv_init.c| 2 +-
 arch/x86/include/asm/pgtable_types.h | 2 --
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index b0da5320bcff..5a4b363ba67b 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -355,7 +355,7 @@ void __init hyperv_init(void)
guest_id = generate_guest_id(0, LINUX_VERSION_CODE, 0);
wrmsrl(HV_X64_MSR_GUEST_OS_ID, guest_id);
 
-   hv_hypercall_pg  = __vmalloc(PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL_RX);
+   hv_hypercall_pg = vmalloc_exec(PAGE_SIZE);
if (hv_hypercall_pg == NULL) {
wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
goto remove_cpuhp_state;
diff --git a/arch/x86/include/asm/pgtable_types.h 
b/arch/x86/include/asm/pgtable_types.h
index b6606fe6cfdf..947867f112ea 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -194,7 +194,6 @@ enum page_cache_mode {
 #define _PAGE_TABLE_NOENC   (__PP|__RW|_USR|___A|   0|___D|   0|   0)
 #define _PAGE_TABLE (__PP|__RW|_USR|___A|   0|___D|   0|   0| _ENC)
 #define __PAGE_KERNEL_RO(__PP|   0|   0|___A|__NX|___D|   0|___G)
-#define __PAGE_KERNEL_RX(__PP|   0|   0|___A|   0|___D|   0|___G)
 #define __PAGE_KERNEL_NOCACHE   (__PP|__RW|   0|___A|__NX|___D|   0|___G| __NC)
 #define __PAGE_KERNEL_VVAR  (__PP|   0|_USR|___A|__NX|___D|   0|___G)
 #define __PAGE_KERNEL_LARGE (__PP|__RW|   0|___A|__NX|___D|_PSE|___G)
@@ -220,7 +219,6 @@ enum page_cache_mode {
 #define PAGE_KERNEL_RO __pgprot_mask(__PAGE_KERNEL_RO | _ENC)
 #define PAGE_KERNEL_EXEC   __pgprot_mask(__PAGE_KERNEL_EXEC   | _ENC)
 #define PAGE_KERNEL_EXEC_NOENC __pgprot_mask(__PAGE_KERNEL_EXEC   |0)
-#define PAGE_KERNEL_RX __pgprot_mask(__PAGE_KERNEL_RX | _ENC)
 #define PAGE_KERNEL_NOCACHE__pgprot_mask(__PAGE_KERNEL_NOCACHE| _ENC)
 #define PAGE_KERNEL_LARGE  __pgprot_mask(__PAGE_KERNEL_LARGE  | _ENC)
 #define PAGE_KERNEL_LARGE_EXEC __pgprot_mask(__PAGE_KERNEL_LARGE_EXEC | _ENC)
-- 
2.25.1



[PATCH 02/28] staging: android: ion: use vmap instead of vm_map_ram

2020-04-08 Thread Christoph Hellwig
vm_map_ram can keep mappings around after the vm_unmap_ram.  Using that
with non-PAGE_KERNEL mappings can lead to all kinds of aliasing issues.

Signed-off-by: Christoph Hellwig 
---
 drivers/staging/android/ion/ion_heap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/android/ion/ion_heap.c 
b/drivers/staging/android/ion/ion_heap.c
index 473b465724f1..a2d5c6df4b96 100644
--- a/drivers/staging/android/ion/ion_heap.c
+++ b/drivers/staging/android/ion/ion_heap.c
@@ -99,12 +99,12 @@ int ion_heap_map_user(struct ion_heap *heap, struct 
ion_buffer *buffer,
 
 static int ion_heap_clear_pages(struct page **pages, int num, pgprot_t pgprot)
 {
-   void *addr = vm_map_ram(pages, num, -1, pgprot);
+   void *addr = vmap(pages, num, VM_MAP);
 
if (!addr)
return -ENOMEM;
memset(addr, 0, PAGE_SIZE * num);
-   vm_unmap_ram(addr, num);
+   vunmap(addr);
 
return 0;
 }
-- 
2.25.1



[PATCH 03/28] staging: media: ipu3: use vmap insted of reimplementing it

2020-04-08 Thread Christoph Hellwig
Just use vmap instead of messing with vmalloc internals.

Signed-off-by: Christoph Hellwig 
---
 drivers/staging/media/ipu3/ipu3-css-pool.h |  4 +--
 drivers/staging/media/ipu3/ipu3-dmamap.c   | 30 ++
 2 files changed, 9 insertions(+), 25 deletions(-)

diff --git a/drivers/staging/media/ipu3/ipu3-css-pool.h 
b/drivers/staging/media/ipu3/ipu3-css-pool.h
index f4a60b41401b..a8ccd4f70320 100644
--- a/drivers/staging/media/ipu3/ipu3-css-pool.h
+++ b/drivers/staging/media/ipu3/ipu3-css-pool.h
@@ -15,14 +15,12 @@ struct imgu_device;
  * @size:  size of the buffer in bytes.
  * @vaddr: kernel virtual address.
  * @daddr: iova dma address to access IPU3.
- * @vma:   private, a pointer to &struct vm_struct,
- * used for imgu_dmamap_free.
  */
 struct imgu_css_map {
size_t size;
void *vaddr;
dma_addr_t daddr;
-   struct vm_struct *vma;
+   struct page **pages;
 };
 
 /**
diff --git a/drivers/staging/media/ipu3/ipu3-dmamap.c 
b/drivers/staging/media/ipu3/ipu3-dmamap.c
index 7431322379f6..8a19b0024152 100644
--- a/drivers/staging/media/ipu3/ipu3-dmamap.c
+++ b/drivers/staging/media/ipu3/ipu3-dmamap.c
@@ -96,6 +96,7 @@ void *imgu_dmamap_alloc(struct imgu_device *imgu, struct 
imgu_css_map *map,
unsigned long shift = iova_shift(&imgu->iova_domain);
struct device *dev = &imgu->pci_dev->dev;
size_t size = PAGE_ALIGN(len);
+   int count = size >> PAGE_SHIFT;
struct page **pages;
dma_addr_t iovaddr;
struct iova *iova;
@@ -114,7 +115,7 @@ void *imgu_dmamap_alloc(struct imgu_device *imgu, struct 
imgu_css_map *map,
 
/* Call IOMMU driver to setup pgt */
iovaddr = iova_dma_addr(&imgu->iova_domain, iova);
-   for (i = 0; i < size / PAGE_SIZE; ++i) {
+   for (i = 0; i < count; ++i) {
rval = imgu_mmu_map(imgu->mmu, iovaddr,
page_to_phys(pages[i]), PAGE_SIZE);
if (rval)
@@ -123,33 +124,23 @@ void *imgu_dmamap_alloc(struct imgu_device *imgu, struct 
imgu_css_map *map,
iovaddr += PAGE_SIZE;
}
 
-   /* Now grab a virtual region */
-   map->vma = __get_vm_area(size, VM_USERMAP, VMALLOC_START, VMALLOC_END);
-   if (!map->vma)
+   map->vaddr = vmap(pages, count, VM_USERMAP, PAGE_KERNEL);
+   if (!map->vaddr)
goto out_unmap;
 
-   map->vma->pages = pages;
-   /* And map it in KVA */
-   if (map_vm_area(map->vma, PAGE_KERNEL, pages))
-   goto out_vunmap;
-
+   map->pages = pages;
map->size = size;
map->daddr = iova_dma_addr(&imgu->iova_domain, iova);
-   map->vaddr = map->vma->addr;
 
dev_dbg(dev, "%s: allocated %zu @ IOVA %pad @ VA %p\n", __func__,
-   size, &map->daddr, map->vma->addr);
-
-   return map->vma->addr;
+   size, &map->daddr, map->vaddr);
 
-out_vunmap:
-   vunmap(map->vma->addr);
+   return map->vaddr;
 
 out_unmap:
imgu_dmamap_free_buffer(pages, size);
imgu_mmu_unmap(imgu->mmu, iova_dma_addr(&imgu->iova_domain, iova),
   i * PAGE_SIZE);
-   map->vma = NULL;
 
 out_free_iova:
__free_iova(&imgu->iova_domain, iova);
@@ -177,8 +168,6 @@ void imgu_dmamap_unmap(struct imgu_device *imgu, struct 
imgu_css_map *map)
  */
 void imgu_dmamap_free(struct imgu_device *imgu, struct imgu_css_map *map)
 {
-   struct vm_struct *area = map->vma;
-
dev_dbg(&imgu->pci_dev->dev, "%s: freeing %zu @ IOVA %pad @ VA %p\n",
__func__, map->size, &map->daddr, map->vaddr);
 
@@ -187,11 +176,8 @@ void imgu_dmamap_free(struct imgu_device *imgu, struct 
imgu_css_map *map)
 
imgu_dmamap_unmap(imgu, map);
 
-   if (WARN_ON(!area) || WARN_ON(!area->pages))
-   return;
-
-   imgu_dmamap_free_buffer(area->pages, map->size);
vunmap(map->vaddr);
+   imgu_dmamap_free_buffer(map->pages, map->size);
map->vaddr = NULL;
 }
 
-- 
2.25.1



[PATCH 04/28] dma-mapping: use vmap insted of reimplementing it

2020-04-08 Thread Christoph Hellwig
Replace the open coded instance of vmap with the actual function.  In
the non-contiguous (IOMMU) case this requires an extra find_vm_area,
but given that this isn't a fast path function that is a small price
to pay.

Signed-off-by: Christoph Hellwig 
---
 kernel/dma/remap.c | 48 --
 1 file changed, 12 insertions(+), 36 deletions(-)

diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c
index d14cbc83986a..7a8ba60951e8 100644
--- a/kernel/dma/remap.c
+++ b/kernel/dma/remap.c
@@ -20,23 +20,6 @@ struct page **dma_common_find_pages(void *cpu_addr)
return area->pages;
 }
 
-static struct vm_struct *__dma_common_pages_remap(struct page **pages,
-   size_t size, pgprot_t prot, const void *caller)
-{
-   struct vm_struct *area;
-
-   area = get_vm_area_caller(size, VM_DMA_COHERENT, caller);
-   if (!area)
-   return NULL;
-
-   if (map_vm_area(area, prot, pages)) {
-   vunmap(area->addr);
-   return NULL;
-   }
-
-   return area;
-}
-
 /*
  * Remaps an array of PAGE_SIZE pages into another vm_area.
  * Cannot be used in non-sleeping contexts
@@ -44,15 +27,12 @@ static struct vm_struct *__dma_common_pages_remap(struct 
page **pages,
 void *dma_common_pages_remap(struct page **pages, size_t size,
 pgprot_t prot, const void *caller)
 {
-   struct vm_struct *area;
+   void *vaddr;
 
-   area = __dma_common_pages_remap(pages, size, prot, caller);
-   if (!area)
-   return NULL;
-
-   area->pages = pages;
-
-   return area->addr;
+   vaddr = vmap(pages, count, VM_DMA_COHERENT, prot);
+   if (vaddr)
+   find_vm_area(vaddr)->pages = pages;
+   return vaddr;
 }
 
 /*
@@ -62,24 +42,20 @@ void *dma_common_pages_remap(struct page **pages, size_t 
size,
 void *dma_common_contiguous_remap(struct page *page, size_t size,
pgprot_t prot, const void *caller)
 {
-   int i;
+   int count = size >> PAGE_SHIFT;
struct page **pages;
-   struct vm_struct *area;
+   void *vaddr;
+   int i;
 
-   pages = kmalloc(sizeof(struct page *) << get_order(size), GFP_KERNEL);
+   pages = kmalloc_array(count, sizeof(struct page *), GFP_KERNEL);
if (!pages)
return NULL;
-
-   for (i = 0; i < (size >> PAGE_SHIFT); i++)
+   for (i = 0; i < count; i++)
pages[i] = nth_page(page, i);
-
-   area = __dma_common_pages_remap(pages, size, prot, caller);
-
+   vaddr = vmap(pages, count, VM_DMA_COHERENT, prot);
kfree(pages);
 
-   if (!area)
-   return NULL;
-   return area->addr;
+   return vaddr;
 }
 
 /*
-- 
2.25.1



[PATCH 06/28] powerpc: remove __ioremap_at and __iounmap_at

2020-04-08 Thread Christoph Hellwig
These helpers are only used for remapping the ISA I/O base.  Replace
the mapping side with a remap_isa_range helper in isa-bridge.c that
hard codes all the known arguments, and just remove __iounmap_at in
favour of open coding it in the only caller.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/include/asm/io.h|  8 -
 arch/powerpc/kernel/isa-bridge.c | 28 +-
 arch/powerpc/mm/ioremap_64.c | 50 
 3 files changed, 21 insertions(+), 65 deletions(-)

diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index 71f1c5d69839..4fdbb9e45dd7 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -699,10 +699,6 @@ static inline void iosync(void)
  *
  * * iounmap undoes such a mapping and can be hooked
  *
- * * __ioremap_at (and the pending __iounmap_at) are low level functions to
- *   create hand-made mappings for use only by the PCI code and cannot
- *   currently be hooked. Must be page aligned.
- *
  * * __ioremap_caller is the same as above but takes an explicit caller
  *   reference rather than using __builtin_return_address(0)
  *
@@ -729,10 +725,6 @@ void __iomem *do_ioremap(phys_addr_t pa, phys_addr_t 
offset, unsigned long size,
 extern void __iomem *__ioremap_caller(phys_addr_t, unsigned long size,
  pgprot_t prot, void *caller);
 
-extern void __iomem * __ioremap_at(phys_addr_t pa, void *ea,
-  unsigned long size, pgprot_t prot);
-extern void __iounmap_at(void *ea, unsigned long size);
-
 /*
  * When CONFIG_PPC_INDIRECT_PIO is set, we use the generic iomap implementation
  * which needs some additional definitions here. They basically allow PIO
diff --git a/arch/powerpc/kernel/isa-bridge.c b/arch/powerpc/kernel/isa-bridge.c
index 773671b512df..2257d24e6a26 100644
--- a/arch/powerpc/kernel/isa-bridge.c
+++ b/arch/powerpc/kernel/isa-bridge.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -38,6 +39,22 @@ EXPORT_SYMBOL_GPL(isa_bridge_pcidev);
 #define ISA_SPACE_MASK 0x1
 #define ISA_SPACE_IO 0x1
 
+static void remap_isa_base(phys_addr_t pa, unsigned long size)
+{
+   WARN_ON_ONCE(ISA_IO_BASE & ~PAGE_MASK);
+   WARN_ON_ONCE(pa & ~PAGE_MASK);
+   WARN_ON_ONCE(size & ~PAGE_MASK);
+
+   if (slab_is_available()) {
+   if (ioremap_page_range(ISA_IO_BASE, ISA_IO_BASE + size, pa,
+   pgprot_noncached(PAGE_KERNEL)))
+   unmap_kernel_range(ISA_IO_BASE, size);
+   } else {
+   early_ioremap_range(ISA_IO_BASE, pa, size,
+   pgprot_noncached(PAGE_KERNEL));
+   }
+}
+
 static void pci_process_ISA_OF_ranges(struct device_node *isa_node,
  unsigned long phb_io_base_phys)
 {
@@ -105,15 +122,13 @@ static void pci_process_ISA_OF_ranges(struct device_node 
*isa_node,
if (size > 0x1)
size = 0x1;
 
-   __ioremap_at(phb_io_base_phys, (void *)ISA_IO_BASE,
-size, pgprot_noncached(PAGE_KERNEL));
+   remap_isa_base(phb_io_base_phys, size);
return;
 
 inval_range:
printk(KERN_ERR "no ISA IO ranges or unexpected isa range, "
   "mapping 64k\n");
-   __ioremap_at(phb_io_base_phys, (void *)ISA_IO_BASE,
-0x1, pgprot_noncached(PAGE_KERNEL));
+   remap_isa_base(phb_io_base_phys, 0x1);
 }
 
 
@@ -248,8 +263,7 @@ void __init isa_bridge_init_non_pci(struct device_node *np)
 * and map it
 */
isa_io_base = ISA_IO_BASE;
-   __ioremap_at(pbase, (void *)ISA_IO_BASE,
-size, pgprot_noncached(PAGE_KERNEL));
+   remap_isa_base(pbase, size);
 
pr_debug("ISA: Non-PCI bridge is %pOF\n", np);
 }
@@ -297,7 +311,7 @@ static void isa_bridge_remove(void)
isa_bridge_pcidev = NULL;
 
/* Unmap the ISA area */
-   __iounmap_at((void *)ISA_IO_BASE, 0x1);
+   unmap_kernel_range(ISA_IO_BASE, 0x1);
 }
 
 /**
diff --git a/arch/powerpc/mm/ioremap_64.c b/arch/powerpc/mm/ioremap_64.c
index 50a99d9684f7..ba5cbb0d66bd 100644
--- a/arch/powerpc/mm/ioremap_64.c
+++ b/arch/powerpc/mm/ioremap_64.c
@@ -4,56 +4,6 @@
 #include 
 #include 
 
-/**
- * Low level function to establish the page tables for an IO mapping
- */
-void __iomem *__ioremap_at(phys_addr_t pa, void *ea, unsigned long size, 
pgprot_t prot)
-{
-   int ret;
-   unsigned long va = (unsigned long)ea;
-
-   /* We don't support the 4K PFN hack with ioremap */
-   if (pgprot_val(prot) & H_PAGE_4K_PFN)
-   return NULL;
-
-   if ((ea + size) >= (void *)IOREMAP_END) {
-   pr_warn("Outside the supported range\n");
-   return NULL;
-   }
-
-   WARN_ON(pa & ~PAGE_MASK);
-   WARN_ON(((unsigned long)ea) & ~PAGE_MASK);
-   WARN_ON(size & ~PAGE_MASK);
-
-   if (slab

[PATCH 05/28] powerpc: add an ioremap_phb helper

2020-04-08 Thread Christoph Hellwig
Factor code shared between pci_64 and electra_cf into a ioremap_pbh
helper that follows the normal ioremap semantics, and returns a
useful __iomem pointer.  Note that it opencodes __ioremap_at as
we know from the callers the slab is available.  Switch pci_64
to also store the result as __iomem pointer, and unmap the result
using iounmap instead of force casting and using vmalloc APIs.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/include/asm/io.h |  2 +
 arch/powerpc/include/asm/pci-bridge.h |  2 +-
 arch/powerpc/kernel/pci_64.c  | 53 ++-
 drivers/pcmcia/electra_cf.c   | 45 ---
 4 files changed, 54 insertions(+), 48 deletions(-)

diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index 635969b5b58e..71f1c5d69839 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -719,6 +719,8 @@ void __iomem *ioremap_coherent(phys_addr_t address, 
unsigned long size);
 
 extern void iounmap(volatile void __iomem *addr);
 
+void __iomem *ioremap_pbh(phys_addr_t paddr, unsigned long size);
+
 int early_ioremap_range(unsigned long ea, phys_addr_t pa,
unsigned long size, pgprot_t prot);
 void __iomem *do_ioremap(phys_addr_t pa, phys_addr_t offset, unsigned long 
size,
diff --git a/arch/powerpc/include/asm/pci-bridge.h 
b/arch/powerpc/include/asm/pci-bridge.h
index 69f4cb3b7c56..b92e81b256e5 100644
--- a/arch/powerpc/include/asm/pci-bridge.h
+++ b/arch/powerpc/include/asm/pci-bridge.h
@@ -66,7 +66,7 @@ struct pci_controller {
 
void __iomem *io_base_virt;
 #ifdef CONFIG_PPC64
-   void *io_base_alloc;
+   void __iomem *io_base_alloc;
 #endif
resource_size_t io_base_phys;
resource_size_t pci_io_size;
diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index f83d1f69b1dd..8e86bd9c1eca 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -109,23 +109,46 @@ int pcibios_unmap_io_space(struct pci_bus *bus)
/* Get the host bridge */
hose = pci_bus_to_host(bus);
 
-   /* Check if we have IOs allocated */
-   if (hose->io_base_alloc == NULL)
-   return 0;
-
pr_debug("IO unmapping for PHB %pOF\n", hose->dn);
pr_debug("  alloc=0x%p\n", hose->io_base_alloc);
 
-   /* This is a PHB, we fully unmap the IO area */
-   vunmap(hose->io_base_alloc);
-
+   iounmap(hose->io_base_alloc);
return 0;
 }
 EXPORT_SYMBOL_GPL(pcibios_unmap_io_space);
 
-static int pcibios_map_phb_io_space(struct pci_controller *hose)
+void __iomem *ioremap_pbh(phys_addr_t paddr, unsigned long size)
 {
struct vm_struct *area;
+   unsigned long addr;
+
+   WARN_ON_ONCE(paddr & ~PAGE_MASK);
+   WARN_ON_ONCE(size & ~PAGE_MASK);
+
+   /*
+* Let's allocate some IO space for that guy. We don't pass VM_IOREMAP
+* because we don't care about alignment tricks that the core does in
+* that case.  Maybe we should due to stupid card with incomplete
+* address decoding but I'd rather not deal with those outside of the
+* reserved 64K legacy region.
+*/
+   area = __get_vm_area(size, 0, PHB_IO_BASE, PHB_IO_END);
+   if (!area)
+   return NULL;
+
+   addr = (unsigned long)area->addr;
+   if (ioremap_page_range(addr, addr + size, paddr,
+   pgprot_noncached(PAGE_KERNEL))) {
+   unmap_kernel_range(addr, size);
+   return NULL;
+   }
+
+   return (void __iomem *)addr;
+}
+EXPORT_SYMBOL_GPL(ioremap_pbh);
+
+static int pcibios_map_phb_io_space(struct pci_controller *hose)
+{
unsigned long phys_page;
unsigned long size_page;
unsigned long io_virt_offset;
@@ -146,12 +169,11 @@ static int pcibios_map_phb_io_space(struct pci_controller 
*hose)
 * with incomplete address decoding but I'd rather not deal with
 * those outside of the reserved 64K legacy region.
 */
-   area = __get_vm_area(size_page, 0, PHB_IO_BASE, PHB_IO_END);
-   if (area == NULL)
+   hose->io_base_alloc = ioremap_pbh(phys_page, size_page);
+   if (!hose->io_base_alloc)
return -ENOMEM;
-   hose->io_base_alloc = area->addr;
-   hose->io_base_virt = (void __iomem *)(area->addr +
- hose->io_base_phys - phys_page);
+   hose->io_base_virt = hose->io_base_alloc +
+   hose->io_base_phys - phys_page;
 
pr_debug("IO mapping for PHB %pOF\n", hose->dn);
pr_debug("  phys=0x%016llx, virt=0x%p (alloc=0x%p)\n",
@@ -159,11 +181,6 @@ static int pcibios_map_phb_io_space(struct pci_controller 
*hose)
pr_debug("  size=0x%016llx (alloc=0x%016lx)\n",
 hose->pci_io_size, size_page);
 
-   /* Establish the mapping */
-   if (__ioremap_at(phys_page, area->addr, size_page,
-p

[PATCH 07/28] mm: remove __get_vm_area

2020-04-08 Thread Christoph Hellwig
Switch the two remaining callers to use __get_vm_area_caller instead.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/kernel/pci_64.c | 3 ++-
 arch/sh/kernel/cpu/sh4/sq.c  | 3 ++-
 include/linux/vmalloc.h  | 2 --
 mm/vmalloc.c | 8 
 4 files changed, 4 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/pci_64.c b/arch/powerpc/kernel/pci_64.c
index 8e86bd9c1eca..155e2ef60053 100644
--- a/arch/powerpc/kernel/pci_64.c
+++ b/arch/powerpc/kernel/pci_64.c
@@ -132,7 +132,8 @@ void __iomem *ioremap_pbh(phys_addr_t paddr, unsigned long 
size)
 * address decoding but I'd rather not deal with those outside of the
 * reserved 64K legacy region.
 */
-   area = __get_vm_area(size, 0, PHB_IO_BASE, PHB_IO_END);
+   area = __get_vm_area_caller(size, 0, PHB_IO_BASE, PHB_IO_END,
+   __builtin_return_address(0));
if (!area)
return NULL;
 
diff --git a/arch/sh/kernel/cpu/sh4/sq.c b/arch/sh/kernel/cpu/sh4/sq.c
index 934ff84844fa..d432164b23b7 100644
--- a/arch/sh/kernel/cpu/sh4/sq.c
+++ b/arch/sh/kernel/cpu/sh4/sq.c
@@ -103,7 +103,8 @@ static int __sq_remap(struct sq_mapping *map, pgprot_t prot)
 #if defined(CONFIG_MMU)
struct vm_struct *vma;
 
-   vma = __get_vm_area(map->size, VM_ALLOC, map->sq_addr, SQ_ADDRMAX);
+   vma = __get_vm_area_caller(map->size, VM_ALLOC, map->sq_addr,
+   SQ_ADDRMAX, __builtin_return_address(0));
if (!vma)
return -ENOMEM;
 
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 0507a162ccd0..3070b4dbc2d9 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -161,8 +161,6 @@ static inline size_t get_vm_area_size(const struct 
vm_struct *area)
 extern struct vm_struct *get_vm_area(unsigned long size, unsigned long flags);
 extern struct vm_struct *get_vm_area_caller(unsigned long size,
unsigned long flags, const void 
*caller);
-extern struct vm_struct *__get_vm_area(unsigned long size, unsigned long flags,
-   unsigned long start, unsigned long end);
 extern struct vm_struct *__get_vm_area_caller(unsigned long size,
unsigned long flags,
unsigned long start, unsigned long end,
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 399f219544f7..d1534d610b48 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2127,14 +2127,6 @@ static struct vm_struct *__get_vm_area_node(unsigned 
long size,
return area;
 }
 
-struct vm_struct *__get_vm_area(unsigned long size, unsigned long flags,
-   unsigned long start, unsigned long end)
-{
-   return __get_vm_area_node(size, 1, flags, start, end, NUMA_NO_NODE,
- GFP_KERNEL, __builtin_return_address(0));
-}
-EXPORT_SYMBOL_GPL(__get_vm_area);
-
 struct vm_struct *__get_vm_area_caller(unsigned long size, unsigned long flags,
   unsigned long start, unsigned long end,
   const void *caller)
-- 
2.25.1



[PATCH 08/28] mm: unexport unmap_kernel_range_noflush

2020-04-08 Thread Christoph Hellwig
There are no modular users of this function.

Signed-off-by: Christoph Hellwig 
---
 mm/vmalloc.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index d1534d610b48..3375f9508ef6 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2029,7 +2029,6 @@ void unmap_kernel_range_noflush(unsigned long addr, 
unsigned long size)
 {
vunmap_page_range(addr, addr + size);
 }
-EXPORT_SYMBOL_GPL(unmap_kernel_range_noflush);
 
 /**
  * unmap_kernel_range - unmap kernel VM area and flush cache and TLB
-- 
2.25.1



[PATCH 11/28] mm: pass addr as unsigned long to vb_free

2020-04-08 Thread Christoph Hellwig
Ever use of addr in vb_free casts to unsigned long first, and the caller
has an unsigned long version of the address available anyway.  Just pass
that and avoid all the casts.

Signed-off-by: Christoph Hellwig 
---
 mm/vmalloc.c | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 9183fc0d365a..aada9e9144bd 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1664,7 +1664,7 @@ static void *vb_alloc(unsigned long size, gfp_t gfp_mask)
return vaddr;
 }
 
-static void vb_free(const void *addr, unsigned long size)
+static void vb_free(unsigned long addr, unsigned long size)
 {
unsigned long offset;
unsigned long vb_idx;
@@ -1674,24 +1674,22 @@ static void vb_free(const void *addr, unsigned long 
size)
BUG_ON(offset_in_page(size));
BUG_ON(size > PAGE_SIZE*VMAP_MAX_ALLOC);
 
-   flush_cache_vunmap((unsigned long)addr, (unsigned long)addr + size);
+   flush_cache_vunmap(addr, addr + size);
 
order = get_order(size);
 
-   offset = (unsigned long)addr & (VMAP_BLOCK_SIZE - 1);
-   offset >>= PAGE_SHIFT;
+   offset = (addr & (VMAP_BLOCK_SIZE - 1)) >> PAGE_SHIFT;
 
-   vb_idx = addr_to_vb_idx((unsigned long)addr);
+   vb_idx = addr_to_vb_idx(addr);
rcu_read_lock();
vb = radix_tree_lookup(&vmap_block_tree, vb_idx);
rcu_read_unlock();
BUG_ON(!vb);
 
-   vunmap_page_range((unsigned long)addr, (unsigned long)addr + size);
+   vunmap_page_range(addr, addr + size);
 
if (debug_pagealloc_enabled_static())
-   flush_tlb_kernel_range((unsigned long)addr,
-   (unsigned long)addr + size);
+   flush_tlb_kernel_range(addr, addr + size);
 
spin_lock(&vb->lock);
 
@@ -1791,7 +1789,7 @@ void vm_unmap_ram(const void *mem, unsigned int count)
 
if (likely(count <= VMAP_MAX_ALLOC)) {
debug_check_no_locks_freed(mem, size);
-   vb_free(mem, size);
+   vb_free(addr, size);
return;
}
 
-- 
2.25.1



[PATCH 09/28] mm: rename CONFIG_PGTABLE_MAPPING to CONFIG_ZSMALLOC_PGTABLE_MAPPING

2020-04-08 Thread Christoph Hellwig
Rename the Kconfig variable to clarify the scope.

Signed-off-by: Christoph Hellwig 
---
 arch/arm/configs/omap2plus_defconfig | 2 +-
 include/linux/zsmalloc.h | 2 +-
 mm/Kconfig   | 2 +-
 mm/zsmalloc.c| 8 
 4 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/arm/configs/omap2plus_defconfig 
b/arch/arm/configs/omap2plus_defconfig
index 3cc3ca5fa027..583d8abd80a4 100644
--- a/arch/arm/configs/omap2plus_defconfig
+++ b/arch/arm/configs/omap2plus_defconfig
@@ -81,7 +81,7 @@ CONFIG_PARTITION_ADVANCED=y
 CONFIG_BINFMT_MISC=y
 CONFIG_CMA=y
 CONFIG_ZSMALLOC=m
-CONFIG_PGTABLE_MAPPING=y
+CONFIG_ZSMALLOC_PGTABLE_MAPPING=y
 CONFIG_NET=y
 CONFIG_PACKET=y
 CONFIG_UNIX=y
diff --git a/include/linux/zsmalloc.h b/include/linux/zsmalloc.h
index 2219cce81ca4..0fdbf653b173 100644
--- a/include/linux/zsmalloc.h
+++ b/include/linux/zsmalloc.h
@@ -20,7 +20,7 @@
  * zsmalloc mapping modes
  *
  * NOTE: These only make a difference when a mapped object spans pages.
- * They also have no effect when PGTABLE_MAPPING is selected.
+ * They also have no effect when ZSMALLOC_PGTABLE_MAPPING is selected.
  */
 enum zs_mapmode {
ZS_MM_RW, /* normal read-write mapping */
diff --git a/mm/Kconfig b/mm/Kconfig
index 691021492e78..36949a9425b8 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -700,7 +700,7 @@ config ZSMALLOC
  returned by an alloc().  This handle must be mapped in order to
  access the allocated space.
 
-config PGTABLE_MAPPING
+config ZSMALLOC_PGTABLE_MAPPING
bool "Use page table mapping to access object in zsmalloc"
depends on ZSMALLOC
help
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 2f836a2b993f..ac0524330b9b 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -293,7 +293,7 @@ struct zspage {
 };
 
 struct mapping_area {
-#ifdef CONFIG_PGTABLE_MAPPING
+#ifdef CONFIG_ZSMALLOC_PGTABLE_MAPPING
struct vm_struct *vm; /* vm area for mapping object that span pages */
 #else
char *vm_buf; /* copy buffer for objects that span pages */
@@ -1113,7 +1113,7 @@ static struct zspage *find_get_zspage(struct size_class 
*class)
return zspage;
 }
 
-#ifdef CONFIG_PGTABLE_MAPPING
+#ifdef CONFIG_ZSMALLOC_PGTABLE_MAPPING
 static inline int __zs_cpu_up(struct mapping_area *area)
 {
/*
@@ -1151,7 +1151,7 @@ static inline void __zs_unmap_object(struct mapping_area 
*area,
unmap_kernel_range(addr, PAGE_SIZE * 2);
 }
 
-#else /* CONFIG_PGTABLE_MAPPING */
+#else /* CONFIG_ZSMALLOC_PGTABLE_MAPPING */
 
 static inline int __zs_cpu_up(struct mapping_area *area)
 {
@@ -1233,7 +1233,7 @@ static void __zs_unmap_object(struct mapping_area *area,
pagefault_enable();
 }
 
-#endif /* CONFIG_PGTABLE_MAPPING */
+#endif /* CONFIG_ZSMALLOC_PGTABLE_MAPPING */
 
 static int zs_cpu_prepare(unsigned int cpu)
 {
-- 
2.25.1



[PATCH 12/28] mm: remove vmap_page_range_noflush and vunmap_page_range

2020-04-08 Thread Christoph Hellwig
These have non-static aliases claled map_kernel_range_noflush and
unmap_kernel_range_noflush that just differ slightly in the calling
conventions that pass addr + size instead of an end.

Signed-off-by: Christoph Hellwig 
---
 mm/vmalloc.c | 98 +---
 1 file changed, 40 insertions(+), 58 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index aada9e9144bd..55df5dc6a9fc 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -127,10 +127,24 @@ static void vunmap_p4d_range(pgd_t *pgd, unsigned long 
addr, unsigned long end)
} while (p4d++, addr = next, addr != end);
 }
 
-static void vunmap_page_range(unsigned long addr, unsigned long end)
+/**
+ * unmap_kernel_range_noflush - unmap kernel VM area
+ * @addr: start of the VM area to unmap
+ * @size: size of the VM area to unmap
+ *
+ * Unmap PFN_UP(@size) pages at @addr.  The VM area @addr and @size specify
+ * should have been allocated using get_vm_area() and its friends.
+ *
+ * NOTE:
+ * This function does NOT do any cache flushing.  The caller is responsible
+ * for calling flush_cache_vunmap() on to-be-mapped areas before calling this
+ * function and flush_tlb_kernel_range() after.
+ */
+void unmap_kernel_range_noflush(unsigned long addr, unsigned long size)
 {
-   pgd_t *pgd;
+   unsigned long end = addr + size;
unsigned long next;
+   pgd_t *pgd;
 
BUG_ON(addr >= end);
pgd = pgd_offset_k(addr);
@@ -219,18 +233,30 @@ static int vmap_p4d_range(pgd_t *pgd, unsigned long addr,
return 0;
 }
 
-/*
- * Set up page tables in kva (addr, end). The ptes shall have prot "prot", and
- * will have pfns corresponding to the "pages" array.
+/**
+ * map_kernel_range_noflush - map kernel VM area with the specified pages
+ * @addr: start of the VM area to map
+ * @size: size of the VM area to map
+ * @prot: page protection flags to use
+ * @pages: pages to map
  *
- * Ie. pte at addr+N*PAGE_SIZE shall point to pfn corresponding to pages[N]
+ * Map PFN_UP(@size) pages at @addr.  The VM area @addr and @size specify 
should
+ * have been allocated using get_vm_area() and its friends.
+ *
+ * NOTE:
+ * This function does NOT do any cache flushing.  The caller is responsible for
+ * calling flush_cache_vmap() on to-be-mapped areas before calling this
+ * function.
+ *
+ * RETURNS:
+ * The number of pages mapped on success, -errno on failure.
  */
-static int vmap_page_range_noflush(unsigned long start, unsigned long end,
-  pgprot_t prot, struct page **pages)
+int map_kernel_range_noflush(unsigned long addr, unsigned long size,
+pgprot_t prot, struct page **pages)
 {
-   pgd_t *pgd;
+   unsigned long end = addr + size;
unsigned long next;
-   unsigned long addr = start;
+   pgd_t *pgd;
int err = 0;
int nr = 0;
 
@@ -251,7 +277,7 @@ static int vmap_page_range(unsigned long start, unsigned 
long end,
 {
int ret;
 
-   ret = vmap_page_range_noflush(start, end, prot, pages);
+   ret = map_kernel_range_noflush(start, end - start, prot, pages);
flush_cache_vmap(start, end);
return ret;
 }
@@ -1226,7 +1252,7 @@ EXPORT_SYMBOL_GPL(unregister_vmap_purge_notifier);
  */
 static void unmap_vmap_area(struct vmap_area *va)
 {
-   vunmap_page_range(va->va_start, va->va_end);
+   unmap_kernel_range_noflush(va->va_start, va->va_end - va->va_start);
 }
 
 /*
@@ -1686,7 +1712,7 @@ static void vb_free(unsigned long addr, unsigned long 
size)
rcu_read_unlock();
BUG_ON(!vb);
 
-   vunmap_page_range(addr, addr + size);
+   unmap_kernel_range_noflush(addr, size);
 
if (debug_pagealloc_enabled_static())
flush_tlb_kernel_range(addr, addr + size);
@@ -1984,50 +2010,6 @@ void __init vmalloc_init(void)
vmap_initialized = true;
 }
 
-/**
- * map_kernel_range_noflush - map kernel VM area with the specified pages
- * @addr: start of the VM area to map
- * @size: size of the VM area to map
- * @prot: page protection flags to use
- * @pages: pages to map
- *
- * Map PFN_UP(@size) pages at @addr.  The VM area @addr and @size
- * specify should have been allocated using get_vm_area() and its
- * friends.
- *
- * NOTE:
- * This function does NOT do any cache flushing.  The caller is
- * responsible for calling flush_cache_vmap() on to-be-mapped areas
- * before calling this function.
- *
- * RETURNS:
- * The number of pages mapped on success, -errno on failure.
- */
-int map_kernel_range_noflush(unsigned long addr, unsigned long size,
-pgprot_t prot, struct page **pages)
-{
-   return vmap_page_range_noflush(addr, addr + size, prot, pages);
-}
-
-/**
- * unmap_kernel_range_noflush - unmap kernel VM area
- * @addr: start of the VM area to unmap
- * @size: size of the VM area to unmap
- *
- * Unmap PFN_UP(@size) pages at @addr.  The VM area @addr and @size
- * specify should have been allocated

[PATCH 13/28] mm: rename vmap_page_range to map_kernel_range

2020-04-08 Thread Christoph Hellwig
This matches the map_kernel_range_noflush API.  Also change to pass
a size instead of the end, similar to the noflush version.

Signed-off-by: Christoph Hellwig 
---
 mm/vmalloc.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 55df5dc6a9fc..a3d810def567 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -272,13 +272,13 @@ int map_kernel_range_noflush(unsigned long addr, unsigned 
long size,
return nr;
 }
 
-static int vmap_page_range(unsigned long start, unsigned long end,
+static int map_kernel_range(unsigned long start, unsigned long size,
   pgprot_t prot, struct page **pages)
 {
int ret;
 
-   ret = map_kernel_range_noflush(start, end - start, prot, pages);
-   flush_cache_vmap(start, end);
+   ret = map_kernel_range_noflush(start, size, prot, pages);
+   flush_cache_vmap(start, start + size);
return ret;
 }
 
@@ -1866,7 +1866,7 @@ void *vm_map_ram(struct page **pages, unsigned int count, 
int node, pgprot_t pro
 
kasan_unpoison_vmalloc(mem, size);
 
-   if (vmap_page_range(addr, addr + size, prot, pages) < 0) {
+   if (map_kernel_range(addr, size, prot, pages) < 0) {
vm_unmap_ram(mem, count);
return NULL;
}
@@ -2030,10 +2030,9 @@ void unmap_kernel_range(unsigned long addr, unsigned 
long size)
 int map_vm_area(struct vm_struct *area, pgprot_t prot, struct page **pages)
 {
unsigned long addr = (unsigned long)area->addr;
-   unsigned long end = addr + get_vm_area_size(area);
int err;
 
-   err = vmap_page_range(addr, end, prot, pages);
+   err = map_kernel_range(addr, get_vm_area_size(area), prot, pages);
 
return err > 0 ? 0 : err;
 }
-- 
2.25.1



[PATCH 10/28] mm: only allow page table mappings for built-in zsmalloc

2020-04-08 Thread Christoph Hellwig
This allows to unexport map_vm_area and unmap_kernel_range, which are
rather deep internal and should not be available to modules.

Signed-off-by: Christoph Hellwig 
---
 mm/Kconfig   | 2 +-
 mm/vmalloc.c | 2 --
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index 36949a9425b8..614cc786b519 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -702,7 +702,7 @@ config ZSMALLOC
 
 config ZSMALLOC_PGTABLE_MAPPING
bool "Use page table mapping to access object in zsmalloc"
-   depends on ZSMALLOC
+   depends on ZSMALLOC=y
help
  By default, zsmalloc uses a copy-based object mapping method to
  access allocations that span two pages. However, if a particular
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 3375f9508ef6..9183fc0d365a 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2046,7 +2046,6 @@ void unmap_kernel_range(unsigned long addr, unsigned long 
size)
vunmap_page_range(addr, end);
flush_tlb_kernel_range(addr, end);
 }
-EXPORT_SYMBOL_GPL(unmap_kernel_range);
 
 int map_vm_area(struct vm_struct *area, pgprot_t prot, struct page **pages)
 {
@@ -2058,7 +2057,6 @@ int map_vm_area(struct vm_struct *area, pgprot_t prot, 
struct page **pages)
 
return err > 0 ? 0 : err;
 }
-EXPORT_SYMBOL_GPL(map_vm_area);
 
 static inline void setup_vmalloc_vm_locked(struct vm_struct *vm,
struct vmap_area *va, unsigned long flags, const void *caller)
-- 
2.25.1



[PATCH 14/28] mm: don't return the number of pages from map_kernel_range{, _noflush}

2020-04-08 Thread Christoph Hellwig
None of the callers needs the number of pages, and a 0 / -errno return
value is a lot more intuitive.

Signed-off-by: Christoph Hellwig 
---
 mm/vmalloc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index a3d810def567..ca8dc5d42580 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -249,7 +249,7 @@ static int vmap_p4d_range(pgd_t *pgd, unsigned long addr,
  * function.
  *
  * RETURNS:
- * The number of pages mapped on success, -errno on failure.
+ * 0 on success, -errno on failure.
  */
 int map_kernel_range_noflush(unsigned long addr, unsigned long size,
 pgprot_t prot, struct page **pages)
@@ -269,7 +269,7 @@ int map_kernel_range_noflush(unsigned long addr, unsigned 
long size,
return err;
} while (pgd++, addr = next, addr != end);
 
-   return nr;
+   return 0;
 }
 
 static int map_kernel_range(unsigned long start, unsigned long size,
-- 
2.25.1



[PATCH 15/28] mm: remove map_vm_range

2020-04-08 Thread Christoph Hellwig
Switch all callers to map_kernel_range, which symmetric to the unmap
side (as well as the _noflush versions).

Signed-off-by: Christoph Hellwig 
---
 Documentation/core-api/cachetlb.rst |  2 +-
 include/linux/vmalloc.h | 10 --
 mm/vmalloc.c| 21 +++--
 mm/zsmalloc.c   |  4 +++-
 net/ceph/ceph_common.c  |  3 +--
 5 files changed, 16 insertions(+), 24 deletions(-)

diff --git a/Documentation/core-api/cachetlb.rst 
b/Documentation/core-api/cachetlb.rst
index 93cb65d52720..a1582cc79f0f 100644
--- a/Documentation/core-api/cachetlb.rst
+++ b/Documentation/core-api/cachetlb.rst
@@ -213,7 +213,7 @@ Here are the routines, one by one:
there will be no entries in the cache for the kernel address
space for virtual addresses in the range 'start' to 'end-1'.
 
-   The first of these two routines is invoked after map_vm_area()
+   The first of these two routines is invoked after map_kernel_range()
has installed the page table entries.  The second is invoked
before unmap_kernel_range() deletes the page table entries.
 
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 3070b4dbc2d9..15ffbd8e8e65 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -168,11 +168,11 @@ extern struct vm_struct *__get_vm_area_caller(unsigned 
long size,
 extern struct vm_struct *remove_vm_area(const void *addr);
 extern struct vm_struct *find_vm_area(const void *addr);
 
-extern int map_vm_area(struct vm_struct *area, pgprot_t prot,
-   struct page **pages);
 #ifdef CONFIG_MMU
 extern int map_kernel_range_noflush(unsigned long start, unsigned long size,
pgprot_t prot, struct page **pages);
+int map_kernel_range(unsigned long start, unsigned long size, pgprot_t prot,
+   struct page **pages);
 extern void unmap_kernel_range_noflush(unsigned long addr, unsigned long size);
 extern void unmap_kernel_range(unsigned long addr, unsigned long size);
 static inline void set_vm_flush_reset_perms(void *addr)
@@ -189,14 +189,12 @@ map_kernel_range_noflush(unsigned long start, unsigned 
long size,
 {
return size >> PAGE_SHIFT;
 }
+#define map_kernel_range map_kernel_range_noflush
 static inline void
 unmap_kernel_range_noflush(unsigned long addr, unsigned long size)
 {
 }
-static inline void
-unmap_kernel_range(unsigned long addr, unsigned long size)
-{
-}
+#define unmap_kernel_range unmap_kernel_range_noflush
 static inline void set_vm_flush_reset_perms(void *addr)
 {
 }
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ca8dc5d42580..b0c7cdc8701a 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -272,8 +272,8 @@ int map_kernel_range_noflush(unsigned long addr, unsigned 
long size,
return 0;
 }
 
-static int map_kernel_range(unsigned long start, unsigned long size,
-  pgprot_t prot, struct page **pages)
+int map_kernel_range(unsigned long start, unsigned long size, pgprot_t prot,
+   struct page **pages)
 {
int ret;
 
@@ -2027,16 +2027,6 @@ void unmap_kernel_range(unsigned long addr, unsigned 
long size)
flush_tlb_kernel_range(addr, end);
 }
 
-int map_vm_area(struct vm_struct *area, pgprot_t prot, struct page **pages)
-{
-   unsigned long addr = (unsigned long)area->addr;
-   int err;
-
-   err = map_kernel_range(addr, get_vm_area_size(area), prot, pages);
-
-   return err > 0 ? 0 : err;
-}
-
 static inline void setup_vmalloc_vm_locked(struct vm_struct *vm,
struct vmap_area *va, unsigned long flags, const void *caller)
 {
@@ -2408,7 +2398,8 @@ void *vmap(struct page **pages, unsigned int count,
if (!area)
return NULL;
 
-   if (map_vm_area(area, prot, pages)) {
+   if (map_kernel_range((unsigned long)area->addr, size, prot,
+   pages) < 0) {
vunmap(area->addr);
return NULL;
}
@@ -2471,8 +2462,10 @@ static void *__vmalloc_area_node(struct vm_struct *area, 
gfp_t gfp_mask,
}
atomic_long_add(area->nr_pages, &nr_vmalloc_pages);
 
-   if (map_vm_area(area, prot, pages))
+   if (map_kernel_range((unsigned long)area->addr, get_vm_area_size(area),
+   prot, pages) < 0)
goto fail;
+
return area->addr;
 
 fail:
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index ac0524330b9b..f6dc0673e62c 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1138,7 +1138,9 @@ static inline void __zs_cpu_down(struct mapping_area 
*area)
 static inline void *__zs_map_object(struct mapping_area *area,
struct page *pages[2], int off, int size)
 {
-   BUG_ON(map_vm_area(area->vm, PAGE_KERNEL, pages));
+   unsigned long addr = (unsigned long)area->vm->addr;
+
+   BUG_ON(map_kernel_range(addr, PAGE_SIZE * 2, PAGE_KERNEL, pages) < 0);
area->vm_addr = area->vm->addr;
 

[PATCH 16/28] mm: remove unmap_vmap_area

2020-04-08 Thread Christoph Hellwig
This function just has a single caller, open code it there.

Signed-off-by: Christoph Hellwig 
---
 mm/vmalloc.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index b0c7cdc8701a..258220b203f1 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1247,14 +1247,6 @@ int unregister_vmap_purge_notifier(struct notifier_block 
*nb)
 }
 EXPORT_SYMBOL_GPL(unregister_vmap_purge_notifier);
 
-/*
- * Clear the pagetable entries of a given vmap_area
- */
-static void unmap_vmap_area(struct vmap_area *va)
-{
-   unmap_kernel_range_noflush(va->va_start, va->va_end - va->va_start);
-}
-
 /*
  * lazy_max_pages is the maximum amount of virtual address space we gather up
  * before attempting to purge with a TLB flush.
@@ -1416,7 +1408,7 @@ static void free_vmap_area_noflush(struct vmap_area *va)
 static void free_unmap_vmap_area(struct vmap_area *va)
 {
flush_cache_vunmap(va->va_start, va->va_end);
-   unmap_vmap_area(va);
+   unmap_kernel_range_noflush(va->va_start, va->va_end - va->va_start);
if (debug_pagealloc_enabled_static())
flush_tlb_kernel_range(va->va_start, va->va_end);
 
-- 
2.25.1



[PATCH 17/28] mm: remove the prot argument from vm_map_ram

2020-04-08 Thread Christoph Hellwig
This is always GFP_KERNEL - for long term mappings with other properties
vmap should be used.

Signed-off-by: Christoph Hellwig 
---
 drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c   | 2 +-
 drivers/media/common/videobuf2/videobuf2-dma-sg.c  | 3 +--
 drivers/media/common/videobuf2/videobuf2-vmalloc.c | 3 +--
 fs/erofs/decompressor.c| 2 +-
 fs/xfs/xfs_buf.c   | 2 +-
 include/linux/vmalloc.h| 3 +--
 mm/nommu.c | 2 +-
 mm/vmalloc.c   | 4 ++--
 8 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c 
b/drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c
index 9272bef57092..debaf7b18ab5 100644
--- a/drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c
+++ b/drivers/gpu/drm/i915/gem/selftests/mock_dmabuf.c
@@ -66,7 +66,7 @@ static void *mock_dmabuf_vmap(struct dma_buf *dma_buf)
 {
struct mock_dmabuf *mock = to_mock(dma_buf);
 
-   return vm_map_ram(mock->pages, mock->npages, 0, PAGE_KERNEL);
+   return vm_map_ram(mock->pages, mock->npages, 0);
 }
 
 static void mock_dmabuf_vunmap(struct dma_buf *dma_buf, void *vaddr)
diff --git a/drivers/media/common/videobuf2/videobuf2-dma-sg.c 
b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
index 6db60e9d5183..92072a08af25 100644
--- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c
+++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c
@@ -309,8 +309,7 @@ static void *vb2_dma_sg_vaddr(void *buf_priv)
if (buf->db_attach)
buf->vaddr = dma_buf_vmap(buf->db_attach->dmabuf);
else
-   buf->vaddr = vm_map_ram(buf->pages,
-   buf->num_pages, -1, PAGE_KERNEL);
+   buf->vaddr = vm_map_ram(buf->pages, buf->num_pages, -1);
}
 
/* add offset in case userptr is not page-aligned */
diff --git a/drivers/media/common/videobuf2/videobuf2-vmalloc.c 
b/drivers/media/common/videobuf2/videobuf2-vmalloc.c
index 1a4f0ca87c7c..c66fda4a65e4 100644
--- a/drivers/media/common/videobuf2/videobuf2-vmalloc.c
+++ b/drivers/media/common/videobuf2/videobuf2-vmalloc.c
@@ -107,8 +107,7 @@ static void *vb2_vmalloc_get_userptr(struct device *dev, 
unsigned long vaddr,
buf->vaddr = (__force void *)
ioremap(__pfn_to_phys(nums[0]), size + offset);
} else {
-   buf->vaddr = vm_map_ram(frame_vector_pages(vec), n_pages, -1,
-   PAGE_KERNEL);
+   buf->vaddr = vm_map_ram(frame_vector_pages(vec), n_pages, -1);
}
 
if (!buf->vaddr)
diff --git a/fs/erofs/decompressor.c b/fs/erofs/decompressor.c
index 5d2d81940679..7628816f2453 100644
--- a/fs/erofs/decompressor.c
+++ b/fs/erofs/decompressor.c
@@ -274,7 +274,7 @@ static int z_erofs_decompress_generic(struct 
z_erofs_decompress_req *rq,
 
i = 0;
while (1) {
-   dst = vm_map_ram(rq->out, nrpages_out, -1, PAGE_KERNEL);
+   dst = vm_map_ram(rq->out, nrpages_out, -1);
 
/* retry two more times (totally 3 times) */
if (dst || ++i >= 3)
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index f880141a2268..940af9da6db1 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -474,7 +474,7 @@ _xfs_buf_map_pages(
nofs_flag = memalloc_nofs_save();
do {
bp->b_addr = vm_map_ram(bp->b_pages, bp->b_page_count,
-   -1, PAGE_KERNEL);
+   -1);
if (bp->b_addr)
break;
vm_unmap_aliases();
diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 15ffbd8e8e65..9273b1a91ca5 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -88,8 +88,7 @@ struct vmap_area {
  * Highlevel APIs for driver use
  */
 extern void vm_unmap_ram(const void *mem, unsigned int count);
-extern void *vm_map_ram(struct page **pages, unsigned int count,
-   int node, pgprot_t prot);
+extern void *vm_map_ram(struct page **pages, unsigned int count, int node);
 extern void vm_unmap_aliases(void);
 
 #ifdef CONFIG_MMU
diff --git a/mm/nommu.c b/mm/nommu.c
index 318df4e236c9..4f07b7ef0297 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -351,7 +351,7 @@ void vunmap(const void *addr)
 }
 EXPORT_SYMBOL(vunmap);
 
-void *vm_map_ram(struct page **pages, unsigned int count, int node, pgprot_t 
prot)
+void *vm_map_ram(struct page **pages, unsigned int count, int node)
 {
BUG();
return NULL;
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 258220b203f1..7356b3f07bd8 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1834,7 +1834,7 @@ EXPORT_SYMBOL(vm_unmap_ram);
  *
  * Returns: a 

[PATCH 18/28] mm: enforce that vmap can't map pages executable

2020-04-08 Thread Christoph Hellwig
To help enforcing the W^X protection don't allow remapping existing
pages as executable.

Based on patch from Peter Zijlstra .

Signed-off-by: Christoph Hellwig 
---
 arch/x86/include/asm/pgtable_types.h | 6 ++
 include/asm-generic/pgtable.h| 4 
 mm/vmalloc.c | 2 +-
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/pgtable_types.h 
b/arch/x86/include/asm/pgtable_types.h
index 947867f112ea..2e7c442cc618 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -282,6 +282,12 @@ typedef struct pgprot { pgprotval_t pgprot; } pgprot_t;
 
 typedef struct { pgdval_t pgd; } pgd_t;
 
+static inline pgprot_t pgprot_nx(pgprot_t prot)
+{
+   return __pgprot(pgprot_val(prot) | _PAGE_NX);
+}
+#define pgprot_nx pgprot_nx
+
 #ifdef CONFIG_X86_PAE
 
 /*
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 329b8c8ca703..8c5f9c29698b 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -491,6 +491,10 @@ static inline int arch_unmap_one(struct mm_struct *mm,
 #define flush_tlb_fix_spurious_fault(vma, address) flush_tlb_page(vma, address)
 #endif
 
+#ifndef pgprot_nx
+#define pgprot_nx(prot)(prot)
+#endif
+
 #ifndef pgprot_noncached
 #define pgprot_noncached(prot) (prot)
 #endif
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 7356b3f07bd8..334c75251ddb 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2390,7 +2390,7 @@ void *vmap(struct page **pages, unsigned int count,
if (!area)
return NULL;
 
-   if (map_kernel_range((unsigned long)area->addr, size, prot,
+   if (map_kernel_range((unsigned long)area->addr, size, pgprot_nx(prot),
pages) < 0) {
vunmap(area->addr);
return NULL;
-- 
2.25.1



[PATCH 19/28] gpu/drm: remove the powerpc hack in drm_legacy_sg_alloc

2020-04-08 Thread Christoph Hellwig
If this code was broken for non-coherent caches a crude powerpc hack
isn't going to help anyone else.  Remove the hack as it is the last
user of __vmalloc passing a page protection flag other than PAGE_KERNEL.

Signed-off-by: Christoph Hellwig 
---
 drivers/gpu/drm/drm_scatter.c | 11 +--
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/drm_scatter.c b/drivers/gpu/drm/drm_scatter.c
index ca520028b2cb..f4e6184d1877 100644
--- a/drivers/gpu/drm/drm_scatter.c
+++ b/drivers/gpu/drm/drm_scatter.c
@@ -43,15 +43,6 @@
 
 #define DEBUG_SCATTER 0
 
-static inline void *drm_vmalloc_dma(unsigned long size)
-{
-#if defined(__powerpc__) && defined(CONFIG_NOT_COHERENT_CACHE)
-   return __vmalloc(size, GFP_KERNEL, pgprot_noncached_wc(PAGE_KERNEL));
-#else
-   return vmalloc_32(size);
-#endif
-}
-
 static void drm_sg_cleanup(struct drm_sg_mem * entry)
 {
struct page *page;
@@ -126,7 +117,7 @@ int drm_legacy_sg_alloc(struct drm_device *dev, void *data,
return -ENOMEM;
}
 
-   entry->virtual = drm_vmalloc_dma(pages << PAGE_SHIFT);
+   entry->virtual = vmalloc_32(pages << PAGE_SHIFT);
if (!entry->virtual) {
kfree(entry->busaddr);
kfree(entry->pagelist);
-- 
2.25.1



[PATCH 22/28] mm: remove both instances of __vmalloc_node_flags

2020-04-08 Thread Christoph Hellwig
The real version just had a few callers that can open code it and
remove one layer of indirection.  The nommu stub was public but only
had a single caller, so remove it and avoid a CONFIG_MMU ifdef in
vmalloc.h.

Signed-off-by: Christoph Hellwig 
---
 include/linux/vmalloc.h |  9 -
 mm/nommu.c  |  3 ++-
 mm/vmalloc.c| 20 ++--
 3 files changed, 8 insertions(+), 24 deletions(-)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index c1b9d6eca05f..4a46d296e70d 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -115,17 +115,8 @@ extern void *__vmalloc_node_range(unsigned long size, 
unsigned long align,
unsigned long start, unsigned long end, gfp_t gfp_mask,
pgprot_t prot, unsigned long vm_flags, int node,
const void *caller);
-#ifndef CONFIG_MMU
-extern void *__vmalloc_node_flags(unsigned long size, int node, gfp_t flags);
-static inline void *__vmalloc_node_flags_caller(unsigned long size, int node,
-   gfp_t flags, void *caller)
-{
-   return __vmalloc_node_flags(size, node, flags);
-}
-#else
 extern void *__vmalloc_node_flags_caller(unsigned long size,
 int node, gfp_t flags, void *caller);
-#endif
 
 extern void vfree(const void *addr);
 extern void vfree_atomic(const void *addr);
diff --git a/mm/nommu.c b/mm/nommu.c
index 2df549adb22b..9553efa59787 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -150,7 +150,8 @@ void *__vmalloc(unsigned long size, gfp_t gfp_mask)
 }
 EXPORT_SYMBOL(__vmalloc);
 
-void *__vmalloc_node_flags(unsigned long size, int node, gfp_t flags)
+void *__vmalloc_node_flags_caller(unsigned long size, int node, gfp_t flags,
+   void *caller)
 {
return __vmalloc(size, flags);
 }
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index de7952959e82..3d59d848ad48 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2566,14 +2566,6 @@ void *__vmalloc(unsigned long size, gfp_t gfp_mask)
 }
 EXPORT_SYMBOL(__vmalloc);
 
-static inline void *__vmalloc_node_flags(unsigned long size,
-   int node, gfp_t flags)
-{
-   return __vmalloc_node(size, 1, flags, node,
-   __builtin_return_address(0));
-}
-
-
 void *__vmalloc_node_flags_caller(unsigned long size, int node, gfp_t flags,
  void *caller)
 {
@@ -2594,8 +2586,8 @@ void *__vmalloc_node_flags_caller(unsigned long size, int 
node, gfp_t flags,
  */
 void *vmalloc(unsigned long size)
 {
-   return __vmalloc_node_flags(size, NUMA_NO_NODE,
-   GFP_KERNEL);
+   return __vmalloc_node(size, 1, GFP_KERNEL, NUMA_NO_NODE,
+   __builtin_return_address(0));
 }
 EXPORT_SYMBOL(vmalloc);
 
@@ -2614,8 +2606,8 @@ EXPORT_SYMBOL(vmalloc);
  */
 void *vzalloc(unsigned long size)
 {
-   return __vmalloc_node_flags(size, NUMA_NO_NODE,
-   GFP_KERNEL | __GFP_ZERO);
+   return __vmalloc_node(size, 1, GFP_KERNEL | __GFP_ZERO, NUMA_NO_NODE,
+   __builtin_return_address(0));
 }
 EXPORT_SYMBOL(vzalloc);
 
@@ -2670,8 +2662,8 @@ EXPORT_SYMBOL(vmalloc_node);
  */
 void *vzalloc_node(unsigned long size, int node)
 {
-   return __vmalloc_node_flags(size, node,
-GFP_KERNEL | __GFP_ZERO);
+   return __vmalloc_node(size, 1, GFP_KERNEL | __GFP_ZERO, node,
+   __builtin_return_address(0));
 }
 EXPORT_SYMBOL(vzalloc_node);
 
-- 
2.25.1



[PATCH 21/28] mm: remove the prot argument to __vmalloc_node

2020-04-08 Thread Christoph Hellwig
This is always PAGE_KERNEL now.

Signed-off-by: Christoph Hellwig 
---
 mm/vmalloc.c | 35 ++-
 1 file changed, 14 insertions(+), 21 deletions(-)

diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 466a449b3a15..de7952959e82 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2401,8 +2401,7 @@ void *vmap(struct page **pages, unsigned int count,
 EXPORT_SYMBOL(vmap);
 
 static void *__vmalloc_node(unsigned long size, unsigned long align,
-   gfp_t gfp_mask, pgprot_t prot,
-   int node, const void *caller);
+   gfp_t gfp_mask, int node, const void *caller);
 static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 pgprot_t prot, int node)
 {
@@ -2420,7 +2419,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, 
gfp_t gfp_mask,
/* Please note that the recursion is strictly bounded. */
if (array_size > PAGE_SIZE) {
pages = __vmalloc_node(array_size, 1, nested_gfp|highmem_mask,
-   PAGE_KERNEL, node, area->caller);
+   node, area->caller);
} else {
pages = kmalloc_node(array_size, nested_gfp, node);
}
@@ -2539,13 +2538,11 @@ EXPORT_SYMBOL_GPL(__vmalloc_node_range);
  * @size:  allocation size
  * @align: desired alignment
  * @gfp_mask:  flags for the page level allocator
- * @prot:  protection mask for the allocated pages
  * @node:  node to use for allocation or NUMA_NO_NODE
  * @caller:caller's return address
  *
- * Allocate enough pages to cover @size from the page level
- * allocator with @gfp_mask flags.  Map them into contiguous
- * kernel virtual space, using a pagetable protection of @prot.
+ * Allocate enough pages to cover @size from the page level allocator with
+ * @gfp_mask flags.  Map them into contiguous kernel virtual space.
  *
  * Reclaim modifiers in @gfp_mask - __GFP_NORETRY, __GFP_RETRY_MAYFAIL
  * and __GFP_NOFAIL are not supported
@@ -2556,16 +2553,15 @@ EXPORT_SYMBOL_GPL(__vmalloc_node_range);
  * Return: pointer to the allocated memory or %NULL on error
  */
 static void *__vmalloc_node(unsigned long size, unsigned long align,
-   gfp_t gfp_mask, pgprot_t prot,
-   int node, const void *caller)
+   gfp_t gfp_mask, int node, const void *caller)
 {
return __vmalloc_node_range(size, align, VMALLOC_START, VMALLOC_END,
-   gfp_mask, prot, 0, node, caller);
+   gfp_mask, PAGE_KERNEL, 0, node, caller);
 }
 
 void *__vmalloc(unsigned long size, gfp_t gfp_mask)
 {
-   return __vmalloc_node(size, 1, gfp_mask, PAGE_KERNEL, NUMA_NO_NODE,
+   return __vmalloc_node(size, 1, gfp_mask, NUMA_NO_NODE,
__builtin_return_address(0));
 }
 EXPORT_SYMBOL(__vmalloc);
@@ -2573,15 +2569,15 @@ EXPORT_SYMBOL(__vmalloc);
 static inline void *__vmalloc_node_flags(unsigned long size,
int node, gfp_t flags)
 {
-   return __vmalloc_node(size, 1, flags, PAGE_KERNEL,
-   node, __builtin_return_address(0));
+   return __vmalloc_node(size, 1, flags, node,
+   __builtin_return_address(0));
 }
 
 
 void *__vmalloc_node_flags_caller(unsigned long size, int node, gfp_t flags,
  void *caller)
 {
-   return __vmalloc_node(size, 1, flags, PAGE_KERNEL, node, caller);
+   return __vmalloc_node(size, 1, flags, node, caller);
 }
 
 /**
@@ -2656,8 +2652,8 @@ EXPORT_SYMBOL(vmalloc_user);
  */
 void *vmalloc_node(unsigned long size, int node)
 {
-   return __vmalloc_node(size, 1, GFP_KERNEL, PAGE_KERNEL,
-   node, __builtin_return_address(0));
+   return __vmalloc_node(size, 1, GFP_KERNEL, node,
+   __builtin_return_address(0));
 }
 EXPORT_SYMBOL(vmalloc_node);
 
@@ -2670,9 +2666,6 @@ EXPORT_SYMBOL(vmalloc_node);
  * allocator and map them into contiguous kernel virtual space.
  * The memory allocated is set to zero.
  *
- * For tight control over page level allocator and protection flags
- * use __vmalloc_node() instead.
- *
  * Return: pointer to the allocated memory or %NULL on error
  */
 void *vzalloc_node(unsigned long size, int node)
@@ -2745,8 +2738,8 @@ void *vmalloc_exec(unsigned long size)
  */
 void *vmalloc_32(unsigned long size)
 {
-   return __vmalloc_node(size, 1, GFP_VMALLOC32, PAGE_KERNEL,
- NUMA_NO_NODE, __builtin_return_address(0));
+   return __vmalloc_node(size, 1, GFP_VMALLOC32, NUMA_NO_NODE,
+   __builtin_return_address(0));
 }
 EXPORT_SYMBOL(vmalloc_32);
 
-- 
2.25.1



[PATCH 20/28] mm: remove the pgprot argument to __vmalloc

2020-04-08 Thread Christoph Hellwig
The pgprot argument to __vmalloc is always PROT_KERNEL now, so remove
it.

Signed-off-by: Christoph Hellwig 
---
 arch/x86/hyperv/hv_init.c  |  3 +--
 arch/x86/include/asm/kvm_host.h|  3 +--
 arch/x86/kvm/svm.c |  3 +--
 drivers/block/drbd/drbd_bitmap.c   |  4 +---
 drivers/gpu/drm/etnaviv/etnaviv_dump.c |  4 ++--
 drivers/lightnvm/pblk-init.c   |  5 ++---
 drivers/md/dm-bufio.c  |  4 ++--
 drivers/mtd/ubi/io.c   |  4 ++--
 drivers/scsi/sd_zbc.c  |  3 +--
 fs/gfs2/dir.c  |  9 -
 fs/gfs2/quota.c|  2 +-
 fs/nfs/blocklayout/extent_tree.c   |  2 +-
 fs/ntfs/malloc.h   |  2 +-
 fs/ubifs/debug.c   |  2 +-
 fs/ubifs/lprops.c  |  2 +-
 fs/ubifs/lpt_commit.c  |  4 ++--
 fs/ubifs/orphan.c  |  2 +-
 fs/xfs/kmem.c  |  2 +-
 include/linux/vmalloc.h|  2 +-
 kernel/bpf/core.c  |  6 +++---
 kernel/groups.c|  2 +-
 kernel/module.c|  3 +--
 mm/nommu.c | 15 +++
 mm/page_alloc.c|  2 +-
 mm/percpu.c|  2 +-
 mm/vmalloc.c   |  4 ++--
 net/bridge/netfilter/ebtables.c|  6 ++
 sound/core/memalloc.c  |  2 +-
 sound/core/pcm_memory.c|  2 +-
 29 files changed, 47 insertions(+), 59 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 5a4b363ba67b..a3d689dfc745 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -95,8 +95,7 @@ static int hv_cpu_init(unsigned int cpu)
 * not be stopped in the case of CPU offlining and the VM will hang.
 */
if (!*hvp) {
-   *hvp = __vmalloc(PAGE_SIZE, GFP_KERNEL | __GFP_ZERO,
-PAGE_KERNEL);
+   *hvp = __vmalloc(PAGE_SIZE, GFP_KERNEL | __GFP_ZERO);
}
 
if (*hvp) {
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 42a2d0d3984a..71bc09bff01a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1280,8 +1280,7 @@ extern struct kmem_cache *x86_fpu_cache;
 #define __KVM_HAVE_ARCH_VM_ALLOC
 static inline struct kvm *kvm_arch_alloc_vm(void)
 {
-   return __vmalloc(kvm_x86_ops.vm_size,
-GFP_KERNEL_ACCOUNT | __GFP_ZERO, PAGE_KERNEL);
+   return __vmalloc(kvm_x86_ops.vm_size, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
 }
 void kvm_arch_free_vm(struct kvm *kvm);
 
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 851e9cc79930..83e8323ba4f2 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1927,8 +1927,7 @@ static struct page **sev_pin_memory(struct kvm *kvm, 
unsigned long uaddr,
/* Avoid using vmalloc for smaller buffers. */
size = npages * sizeof(struct page *);
if (size > PAGE_SIZE)
-   pages = __vmalloc(size, GFP_KERNEL_ACCOUNT | __GFP_ZERO,
- PAGE_KERNEL);
+   pages = __vmalloc(size, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
else
pages = kmalloc(size, GFP_KERNEL_ACCOUNT);
 
diff --git a/drivers/block/drbd/drbd_bitmap.c b/drivers/block/drbd/drbd_bitmap.c
index 15e99697234a..df53dca5d02c 100644
--- a/drivers/block/drbd/drbd_bitmap.c
+++ b/drivers/block/drbd/drbd_bitmap.c
@@ -396,9 +396,7 @@ static struct page **bm_realloc_pages(struct drbd_bitmap 
*b, unsigned long want)
bytes = sizeof(struct page *)*want;
new_pages = kzalloc(bytes, GFP_NOIO | __GFP_NOWARN);
if (!new_pages) {
-   new_pages = __vmalloc(bytes,
-   GFP_NOIO | __GFP_ZERO,
-   PAGE_KERNEL);
+   new_pages = __vmalloc(bytes, GFP_NOIO | __GFP_ZERO);
if (!new_pages)
return NULL;
}
diff --git a/drivers/gpu/drm/etnaviv/etnaviv_dump.c 
b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
index 648cf0207309..706af0304ca4 100644
--- a/drivers/gpu/drm/etnaviv/etnaviv_dump.c
+++ b/drivers/gpu/drm/etnaviv/etnaviv_dump.c
@@ -154,8 +154,8 @@ void etnaviv_core_dump(struct etnaviv_gem_submit *submit)
file_size += sizeof(*iter.hdr) * n_obj;
 
/* Allocate the file in vmalloc memory, it's likely to be big */
-   iter.start = __vmalloc(file_size, GFP_KERNEL | __GFP_NOWARN | 
__GFP_NORETRY,
-  PAGE_KERNEL);
+   iter.start = __vmalloc(file_size, GFP_KERNEL | __GFP_NOWARN |
+   __GFP_NORETRY);
if (!iter.start) {
mutex_unlock(&gpu->mmu_context->lock);
dev_warn(gpu->dev, "failed to allocate devcoredump file\n");
diff --git a/drivers/lightnvm/pblk-init.c b/drivers/

[PATCH 24/28] mm: switch the test_vmalloc module to use __vmalloc_node

2020-04-08 Thread Christoph Hellwig
No need to export the very low-level __vmalloc_node_range when the
test module can use a slightly higher level variant.

Signed-off-by: Christoph Hellwig 
---
 lib/test_vmalloc.c | 26 +++---
 mm/vmalloc.c   | 17 -
 2 files changed, 15 insertions(+), 28 deletions(-)

diff --git a/lib/test_vmalloc.c b/lib/test_vmalloc.c
index 8bbefcaddfe8..cd6aef05dfb4 100644
--- a/lib/test_vmalloc.c
+++ b/lib/test_vmalloc.c
@@ -91,12 +91,8 @@ static int random_size_align_alloc_test(void)
 */
size = ((rnd % 10) + 1) * PAGE_SIZE;
 
-   ptr = __vmalloc_node_range(size, align,
-  VMALLOC_START, VMALLOC_END,
-  GFP_KERNEL | __GFP_ZERO,
-  PAGE_KERNEL,
-  0, 0, __builtin_return_address(0));
-
+   ptr = __vmalloc_node(size, align, GFP_KERNEL | __GFP_ZERO,
+   __builtin_return_address(0));
if (!ptr)
return -1;
 
@@ -118,12 +114,8 @@ static int align_shift_alloc_test(void)
for (i = 0; i < BITS_PER_LONG; i++) {
align = ((unsigned long) 1) << i;
 
-   ptr = __vmalloc_node_range(PAGE_SIZE, align,
-   VMALLOC_START, VMALLOC_END,
-   GFP_KERNEL | __GFP_ZERO,
-   PAGE_KERNEL,
-   0, 0, __builtin_return_address(0));
-
+   ptr = __vmalloc_node(PAGE_SIZE, align, GFP_KERNEL | __GFP_ZERO,
+   __builtin_return_address(0));
if (!ptr)
return -1;
 
@@ -139,13 +131,9 @@ static int fix_align_alloc_test(void)
int i;
 
for (i = 0; i < test_loop_count; i++) {
-   ptr = __vmalloc_node_range(5 * PAGE_SIZE,
-   THREAD_ALIGN << 1,
-   VMALLOC_START, VMALLOC_END,
-   GFP_KERNEL | __GFP_ZERO,
-   PAGE_KERNEL,
-   0, 0, __builtin_return_address(0));
-
+   ptr = __vmalloc_node(5 * PAGE_SIZE, THREAD_ALIGN << 1,
+   GFP_KERNEL | __GFP_ZERO,
+   __builtin_return_address(0));
if (!ptr)
return -1;
 
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index ae8249ef5821..333fbe77255a 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2522,15 +2522,6 @@ void *__vmalloc_node_range(unsigned long size, unsigned 
long align,
return NULL;
 }
 
-/*
- * This is only for performance analysis of vmalloc and stress purpose.
- * It is required by vmalloc test module, therefore do not use it other
- * than that.
- */
-#ifdef CONFIG_TEST_VMALLOC_MODULE
-EXPORT_SYMBOL_GPL(__vmalloc_node_range);
-#endif
-
 /**
  * __vmalloc_node - allocate virtually contiguous memory
  * @size:  allocation size
@@ -2556,6 +2547,14 @@ void *__vmalloc_node(unsigned long size, unsigned long 
align,
return __vmalloc_node_range(size, align, VMALLOC_START, VMALLOC_END,
gfp_mask, PAGE_KERNEL, 0, node, caller);
 }
+/*
+ * This is only for performance analysis of vmalloc and stress purpose.
+ * It is required by vmalloc test module, therefore do not use it other
+ * than that.
+ */
+#ifdef CONFIG_TEST_VMALLOC_MODULE
+EXPORT_SYMBOL_GPL(__vmalloc_node);
+#endif
 
 void *__vmalloc(unsigned long size, gfp_t gfp_mask)
 {
-- 
2.25.1



[PATCH 23/28] mm: remove __vmalloc_node_flags_caller

2020-04-08 Thread Christoph Hellwig
Just use __vmalloc_node instead which gets and extra argument.  To be
able to to use __vmalloc_node in all caller make it available outside
of vmalloc and implement it in nommu.c.

Signed-off-by: Christoph Hellwig 
---
 include/linux/vmalloc.h |  4 ++--
 kernel/bpf/syscall.c|  5 ++---
 mm/nommu.c  |  4 ++--
 mm/util.c   |  2 +-
 mm/vmalloc.c| 10 +-
 5 files changed, 8 insertions(+), 17 deletions(-)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 4a46d296e70d..108f49b47756 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -115,8 +115,8 @@ extern void *__vmalloc_node_range(unsigned long size, 
unsigned long align,
unsigned long start, unsigned long end, gfp_t gfp_mask,
pgprot_t prot, unsigned long vm_flags, int node,
const void *caller);
-extern void *__vmalloc_node_flags_caller(unsigned long size,
-int node, gfp_t flags, void *caller);
+void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask,
+   int node, const void *caller);
 
 extern void vfree(const void *addr);
 extern void vfree_atomic(const void *addr);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 64783da34202..48d98ea8fad6 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -299,9 +299,8 @@ static void *__bpf_map_area_alloc(u64 size, int numa_node, 
bool mmapable)
return vmalloc_user_node_flags(size, numa_node, GFP_KERNEL |
   __GFP_RETRY_MAYFAIL | flags);
}
-   return __vmalloc_node_flags_caller(size, numa_node,
-  GFP_KERNEL | __GFP_RETRY_MAYFAIL |
-  flags, __builtin_return_address(0));
+   return __vmalloc_node(size, 1, GFP_KERNEL | __GFP_RETRY_MAYFAIL | flags,
+ numa_node, __builtin_return_address(0));
 }
 
 void *bpf_map_area_alloc(u64 size, int numa_node)
diff --git a/mm/nommu.c b/mm/nommu.c
index 9553efa59787..81a86cd85893 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -150,8 +150,8 @@ void *__vmalloc(unsigned long size, gfp_t gfp_mask)
 }
 EXPORT_SYMBOL(__vmalloc);
 
-void *__vmalloc_node_flags_caller(unsigned long size, int node, gfp_t flags,
-   void *caller)
+void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask,
+   int node, const void *caller)
 {
return __vmalloc(size, flags);
 }
diff --git a/mm/util.c b/mm/util.c
index 988d11e6c17c..6d5868adbe18 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -580,7 +580,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
if (ret || size <= PAGE_SIZE)
return ret;
 
-   return __vmalloc_node_flags_caller(size, node, flags,
+   return __vmalloc_node(size, 1, flags, node,
__builtin_return_address(0));
 }
 EXPORT_SYMBOL(kvmalloc_node);
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 3d59d848ad48..ae8249ef5821 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2400,8 +2400,6 @@ void *vmap(struct page **pages, unsigned int count,
 }
 EXPORT_SYMBOL(vmap);
 
-static void *__vmalloc_node(unsigned long size, unsigned long align,
-   gfp_t gfp_mask, int node, const void *caller);
 static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 pgprot_t prot, int node)
 {
@@ -2552,7 +2550,7 @@ EXPORT_SYMBOL_GPL(__vmalloc_node_range);
  *
  * Return: pointer to the allocated memory or %NULL on error
  */
-static void *__vmalloc_node(unsigned long size, unsigned long align,
+void *__vmalloc_node(unsigned long size, unsigned long align,
gfp_t gfp_mask, int node, const void *caller)
 {
return __vmalloc_node_range(size, align, VMALLOC_START, VMALLOC_END,
@@ -2566,12 +2564,6 @@ void *__vmalloc(unsigned long size, gfp_t gfp_mask)
 }
 EXPORT_SYMBOL(__vmalloc);
 
-void *__vmalloc_node_flags_caller(unsigned long size, int node, gfp_t flags,
- void *caller)
-{
-   return __vmalloc_node(size, 1, flags, node, caller);
-}
-
 /**
  * vmalloc - allocate virtually contiguous memory
  * @size:allocation size
-- 
2.25.1



[PATCH 27/28] s390: use __vmalloc_node in alloc_vm_stack

2020-04-08 Thread Christoph Hellwig
alloc_vm_stack can use a slightly higher level vmalloc function.

Signed-off-by: Christoph Hellwig 
---
 arch/powerpc/kernel/irq.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index a25ed47087ee..4518fb1d6bf4 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -735,9 +735,8 @@ void do_IRQ(struct pt_regs *regs)
 
 static void *__init alloc_vm_stack(void)
 {
-   return __vmalloc_node_range(THREAD_SIZE, THREAD_ALIGN, VMALLOC_START,
-   VMALLOC_END, THREADINFO_GFP, PAGE_KERNEL,
-0, NUMA_NO_NODE, (void*)_RET_IP_);
+   return __vmalloc_node(THREAD_SIZE, THREAD_ALIGN, THREADINFO_GFP,
+ NUMA_NO_NODE, (void *)_RET_IP_);
 }
 
 static void __init vmap_irqstack_init(void)
-- 
2.25.1



[PATCH 26/28] arm64: use __vmalloc_node in arch_alloc_vmap_stack

2020-04-08 Thread Christoph Hellwig
arch_alloc_vmap_stack can use a slightly higher level vmalloc function.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/include/asm/vmap_stack.h | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/include/asm/vmap_stack.h 
b/arch/arm64/include/asm/vmap_stack.h
index 0a12115d9638..0cc6636e3f15 100644
--- a/arch/arm64/include/asm/vmap_stack.h
+++ b/arch/arm64/include/asm/vmap_stack.h
@@ -19,10 +19,8 @@ static inline unsigned long *arch_alloc_vmap_stack(size_t 
stack_size, int node)
 {
BUILD_BUG_ON(!IS_ENABLED(CONFIG_VMAP_STACK));
 
-   return __vmalloc_node_range(stack_size, THREAD_ALIGN,
-   VMALLOC_START, VMALLOC_END,
-   THREADINFO_GFP, PAGE_KERNEL, 0, node,
-   __builtin_return_address(0));
+   return __vmalloc_node(stack_size, THREAD_ALIGN, THREADINFO_GFP, node,
+   __builtin_return_address(0));
 }
 
 #endif /* __ASM_VMAP_STACK_H */
-- 
2.25.1



[PATCH 25/28] mm: remove vmalloc_user_node_flags

2020-04-08 Thread Christoph Hellwig
Open code it in __bpf_map_area_alloc, which is the only caller.  Also
clean up __bpf_map_area_alloc to have a single vmalloc call with
slightly different flags instead of the current two different calls.

For this to compile for the nommu case add a __vmalloc_node_range stub
to nommu.c.

Signed-off-by: Christoph Hellwig 
---
 include/linux/vmalloc.h |  1 -
 kernel/bpf/syscall.c| 23 +--
 mm/nommu.c  | 14 --
 mm/vmalloc.c| 20 
 4 files changed, 21 insertions(+), 37 deletions(-)

diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
index 108f49b47756..f90f2946aac2 100644
--- a/include/linux/vmalloc.h
+++ b/include/linux/vmalloc.h
@@ -106,7 +106,6 @@ extern void *vzalloc(unsigned long size);
 extern void *vmalloc_user(unsigned long size);
 extern void *vmalloc_node(unsigned long size, int node);
 extern void *vzalloc_node(unsigned long size, int node);
-extern void *vmalloc_user_node_flags(unsigned long size, int node, gfp_t 
flags);
 extern void *vmalloc_exec(unsigned long size);
 extern void *vmalloc_32(unsigned long size);
 extern void *vmalloc_32_user(unsigned long size);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 48d98ea8fad6..249d9bd43321 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -281,26 +281,29 @@ static void *__bpf_map_area_alloc(u64 size, int 
numa_node, bool mmapable)
 * __GFP_RETRY_MAYFAIL to avoid such situations.
 */
 
-   const gfp_t flags = __GFP_NOWARN | __GFP_ZERO;
+   const gfp_t gfp = __GFP_NOWARN | __GFP_ZERO;
+   unsigned int flags = 0;
+   unsigned long align = 1;
void *area;
 
if (size >= SIZE_MAX)
return NULL;
 
/* kmalloc()'ed memory can't be mmap()'ed */
-   if (!mmapable && size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
-   area = kmalloc_node(size, GFP_USER | __GFP_NORETRY | flags,
+   if (mmapable) {
+   BUG_ON(!PAGE_ALIGNED(size));
+   align = SHMLBA;
+   flags = VM_USERMAP;
+   } else if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) {
+   area = kmalloc_node(size, gfp | GFP_USER | __GFP_NORETRY,
numa_node);
if (area != NULL)
return area;
}
-   if (mmapable) {
-   BUG_ON(!PAGE_ALIGNED(size));
-   return vmalloc_user_node_flags(size, numa_node, GFP_KERNEL |
-  __GFP_RETRY_MAYFAIL | flags);
-   }
-   return __vmalloc_node(size, 1, GFP_KERNEL | __GFP_RETRY_MAYFAIL | flags,
- numa_node, __builtin_return_address(0));
+
+   return __vmalloc_node_range(size, align, VMALLOC_START, VMALLOC_END,
+   gfp | GFP_KERNEL | __GFP_RETRY_MAYFAIL, PAGE_KERNEL,
+   flags, numa_node, __builtin_return_address(0));
 }
 
 void *bpf_map_area_alloc(u64 size, int numa_node)
diff --git a/mm/nommu.c b/mm/nommu.c
index 81a86cd85893..b42cd6003d7d 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -150,6 +150,14 @@ void *__vmalloc(unsigned long size, gfp_t gfp_mask)
 }
 EXPORT_SYMBOL(__vmalloc);
 
+void *__vmalloc_node_range(unsigned long size, unsigned long align,
+   unsigned long start, unsigned long end, gfp_t gfp_mask,
+   pgprot_t prot, unsigned long vm_flags, int node,
+   const void *caller)
+{
+   return __vmalloc(size, flags);
+}
+
 void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask,
int node, const void *caller)
 {
@@ -180,12 +188,6 @@ void *vmalloc_user(unsigned long size)
 }
 EXPORT_SYMBOL(vmalloc_user);
 
-void *vmalloc_user_node_flags(unsigned long size, int node, gfp_t flags)
-{
-   return __vmalloc_user_flags(size, flags | __GFP_ZERO);
-}
-EXPORT_SYMBOL(vmalloc_user_node_flags);
-
 struct page *vmalloc_to_page(const void *addr)
 {
return virt_to_page(addr);
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 333fbe77255a..f6f2acdaf70c 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -2658,26 +2658,6 @@ void *vzalloc_node(unsigned long size, int node)
 }
 EXPORT_SYMBOL(vzalloc_node);
 
-/**
- * vmalloc_user_node_flags - allocate memory for userspace on a specific node
- * @size: allocation size
- * @node: numa node
- * @flags: flags for the page level allocator
- *
- * The resulting memory area is zeroed so it can be mapped to userspace
- * without leaking data.
- *
- * Return: pointer to the allocated memory or %NULL on error
- */
-void *vmalloc_user_node_flags(unsigned long size, int node, gfp_t flags)
-{
-   return __vmalloc_node_range(size, SHMLBA,  VMALLOC_START, VMALLOC_END,
-   flags | __GFP_ZERO, PAGE_KERNEL,
-   VM_USERMAP, node,
-   __builtin_return_address(0));
-}
-EXPORT_SYMBOL(vmalloc_user_node_fla

[PATCH 28/28] s390: use __vmalloc_node in stack_alloc

2020-04-08 Thread Christoph Hellwig
stack_alloc can use a slightly higher level vmalloc function.

Signed-off-by: Christoph Hellwig 
---
 arch/s390/kernel/setup.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
index 36445dd40fdb..0f0b140b5558 100644
--- a/arch/s390/kernel/setup.c
+++ b/arch/s390/kernel/setup.c
@@ -305,12 +305,9 @@ void *restart_stack __section(.data);
 unsigned long stack_alloc(void)
 {
 #ifdef CONFIG_VMAP_STACK
-   return (unsigned long)
-   __vmalloc_node_range(THREAD_SIZE, THREAD_SIZE,
-VMALLOC_START, VMALLOC_END,
-THREADINFO_GFP,
-PAGE_KERNEL, 0, NUMA_NO_NODE,
-__builtin_return_address(0));
+   return (unsigned long)__vmalloc_node(THREAD_SIZE, THREAD_SIZE,
+   THREADINFO_GFP, NUMA_NO_NODE,
+   __builtin_return_address(0));
 #else
return __get_free_pages(GFP_KERNEL, THREAD_SIZE_ORDER);
 #endif
-- 
2.25.1



Re: [PATCH 0/2] powerpc: Remove support for ppc405/440 Xilinx platforms

2020-04-08 Thread Michael Ellerman
Benjamin Herrenschmidt  writes:
> On Fri, 2020-04-03 at 15:59 +1100, Michael Ellerman wrote:
>> Benjamin Herrenschmidt  writes:
>> > On Tue, 2020-03-31 at 16:30 +1100, Michael Ellerman wrote:
>> > > I have no attachment to 40x, and I'd certainly be happy to have
>> > > less
>> > > code in the tree, we struggle to keep even the modern platforms
>> > > well
>> > > maintained.
>> > > 
>> > > At the same time I don't want to render anyone's hardware
>> > > obsolete
>> > > unnecessarily. But if there's really no one using 40x then we
>> > > should
>> > > remove it, it could well be broken already.
>> > > 
>> > > So I guess post a series to do the removal and we'll see if
>> > > anyone
>> > > speaks up.
>> > 
>> > We shouldn't remove 40x completely. Just remove the Xilinx 405
>> > stuff.
>> 
>> Congratulations on becoming the 40x maintainer!
>
> Didn't I give you my last 40x system ? :-)

Probably, but my desk is nearly as messy as yours so it's probably
buried under some even more obscure hardware :P

> IBM still put 40x cores inside POWER chips no ?

Oh yeah that's true. I guess most folks don't know that, or that they
run RHEL on them.

cheers


Re: [PATCH 1/1] powerpc/crash: Use NMI context for printk after crashing other CPUs

2020-04-08 Thread Michael Ellerman
Leonardo Bras  writes:
> Currently, if printk lock (logbuf_lock) is held by other thread during
> crash, there is a chance of deadlocking the crash on next printk, and
> blocking a possibly desired kdump.
>
> After sending IPI to all other CPUs, make printk enter in NMI context,
> as it will use per-cpu buffers to store the message, and avoid locking
> logbuf_lock.
>
> Signed-off-by: Leonardo Bras 
> ---
>  arch/powerpc/kexec/crash.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/arch/powerpc/kexec/crash.c b/arch/powerpc/kexec/crash.c
> index d488311efab1..9b73e3991bf4 100644
> --- a/arch/powerpc/kexec/crash.c
> +++ b/arch/powerpc/kexec/crash.c
> @@ -115,6 +115,7 @@ static void crash_kexec_prepare_cpus(int cpu)

Added context:

printk(KERN_EMERG "Sending IPI to other CPUs\n");

if (crash_wake_offline)
ncpus = num_present_cpus() - 1;

>  
>   crash_send_ipi(crash_ipi_callback);
>   smp_wmb();
> + printk_nmi_enter();
  
Why did you decide to put it there, rather than at the start of
default_machine_crash_shutdown() like I did?

The printk() above could have already deadlocked if another CPU is stuck
with the logbuf lock held.

cheers


Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests

2020-04-08 Thread Gerald Schaefer
On Wed, 8 Apr 2020 12:41:51 +0530
Anshuman Khandual  wrote:

[...]
> >   
> >>
> >> Some thing like this instead.
> >>
> >> pte_t pte = READ_ONCE(*ptep);
> >> pte = pte_mkhuge(__pte((pte_val(pte) | RANDOM_ORVALUE) & PMD_MASK));
> >>
> >> We cannot use mk_pte_phys() as it is defined only on some platforms
> >> without any generic fallback for others.  
> > 
> > Oh, didn't know that, sorry. What about using mk_pte() instead, at least
> > it would result in a present pte:
> > 
> > pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), prot));  
> 
> Lets use mk_pte() here but can we do this instead
> 
> paddr = (__pfn_to_phys(pfn) | RANDOM_ORVALUE) & PMD_MASK;
> pte = pte_mkhuge(mk_pte(phys_to_page(paddr), prot));
> 

Sure, that will also work.

BTW, this RANDOM_ORVALUE is not really very random, the way it is
defined. For s390 we already changed it to mask out some arch bits,
but I guess there are other archs and bits that would always be
set with this "not so random" value, and I wonder if/how that would
affect all the tests using this value, see also below.

> > 
> > And if you also want to do some with the existing value, which seems
> > to be an empty pte, then maybe just check if writing and reading that
> > value with set_huge_pte_at() / huge_ptep_get() returns the same,
> > i.e. initially w/o RANDOM_ORVALUE.
> > 
> > So, in combination, like this (BTW, why is the barrier() needed, it
> > is not used for the other set_huge_pte_at() calls later?):  
> 
> Ahh missed, will add them. Earlier we faced problem without it after
> set_pte_at() for a test on powerpc (64) platform. Hence just added it
> here to be extra careful.
> 
> > 
> > @@ -733,24 +733,28 @@ static void __init hugetlb_advanced_test
> > struct page *page = pfn_to_page(pfn);
> > pte_t pte = READ_ONCE(*ptep);
> >  
> > -   pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
> > +   set_huge_pte_at(mm, vaddr, ptep, pte);
> > +   WARN_ON(!pte_same(pte, huge_ptep_get(ptep)));
> > +
> > +   pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), 
> > prot));
> > set_huge_pte_at(mm, vaddr, ptep, pte);
> > barrier();
> > WARN_ON(!pte_same(pte, huge_ptep_get(ptep)));
> > 
> > This would actually add a new test "write empty pte with
> > set_huge_pte_at(), then verify with huge_ptep_get()", which happens
> > to trigger a warning on s390 :-)  
> 
> On arm64 as well which checks for pte_present() in set_huge_pte_at().
> But PTE present check is not really present in each set_huge_pte_at()
> implementation especially without __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT.
> Hence wondering if we should add this new test here which will keep
> giving warnings on s390 and arm64 (at the least).

Hmm, interesting. I forgot about huge swap / migration, which is not
(and probably cannot be) supported on s390. The pte_present() check
on arm64 seems to check for such huge swap / migration entries,
according to the comment.

The new test "write empty pte with set_huge_pte_at(), then verify
with huge_ptep_get()" would then probably trigger the
WARN_ON(!pte_present(pte)) in arm64 code. So I guess "writing empty
ptes with set_huge_pte_at()" is not really a valid use case in practice,
or else you would have seen this warning before. In that case, it
might not be a good idea to add this test.

I also do wonder now, why the original test with
"pte = __pte(pte_val(pte) | RANDOM_ORVALUE);"
did not also trigger that warning on arm64. On s390 this test failed
exactly because the constructed pte was not present (initially empty,
or'ing RANDOM_ORVALUE does not make it present for s390). I guess this
just worked by chance on arm64, because the bits from RANDOM_ORVALUE
also happened to mark the pte present for arm64.

This brings us back to the question above, regarding the "randomness"
of RANDOM_ORVALUE. Not really sure what the intention behind that was,
but maybe it would make sense to restrict this RANDOM_ORVALUE to
non-arch-specific bits, i.e. only bits that would be part of the
address value within a page table entry? Or was it intentionally
chosen to also mess with other bits?

Regards,
Gerald



Re: [PATCH 26/28] arm64: use __vmalloc_node in arch_alloc_vmap_stack

2020-04-08 Thread Mark Rutland
On Wed, Apr 08, 2020 at 01:59:24PM +0200, Christoph Hellwig wrote:
> arch_alloc_vmap_stack can use a slightly higher level vmalloc function.
> 
> Signed-off-by: Christoph Hellwig 

Acked-by: Mark Rutland 

Mark.

> ---
>  arch/arm64/include/asm/vmap_stack.h | 6 ++
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/vmap_stack.h 
> b/arch/arm64/include/asm/vmap_stack.h
> index 0a12115d9638..0cc6636e3f15 100644
> --- a/arch/arm64/include/asm/vmap_stack.h
> +++ b/arch/arm64/include/asm/vmap_stack.h
> @@ -19,10 +19,8 @@ static inline unsigned long *arch_alloc_vmap_stack(size_t 
> stack_size, int node)
>  {
>   BUILD_BUG_ON(!IS_ENABLED(CONFIG_VMAP_STACK));
>  
> - return __vmalloc_node_range(stack_size, THREAD_ALIGN,
> - VMALLOC_START, VMALLOC_END,
> - THREADINFO_GFP, PAGE_KERNEL, 0, node,
> - __builtin_return_address(0));
> + return __vmalloc_node(stack_size, THREAD_ALIGN, THREADINFO_GFP, node,
> + __builtin_return_address(0));
>  }
>  
>  #endif /* __ASM_VMAP_STACK_H */
> -- 
> 2.25.1
> 


Re: [PATCH v3 1/1] ppc/crash: Reset spinlocks during crash

2020-04-08 Thread Michael Ellerman
Leonardo Bras  writes:
> Hello Nick, Michael,
>
> On Fri, 2020-04-03 at 16:41 +1000, Nicholas Piggin wrote:
> [...]
>> > > PAPR says we are not allowed to have multiple CPUs calling RTAS at once,
>> > > except for a very small list of RTAS calls. So if we bust the RTAS lock
>> > > there's a risk we violate that part of PAPR and crash even harder.
>> > 
>> > Interesting, I was not aware.
>> > 
>> > > Also it's not specific to kdump, we can't even get through a normal
>> > > reboot if we crash with the RTAS lock held.
>> > > 
>> > > Anyway here's a patch with some ideas. That allows me to get from a
>> > > crash with the RTAS lock held through kdump into the 2nd kernel. But it
>> > > only works if it's the crashing CPU that holds the RTAS lock.
>> > > 
>> > 
>> > Nice idea. 
>> > But my test environment is just triggering a crash from sysrq, so I
>> > think it would not improve the result, given that this thread is
>> > probably not holding the lock by the time.
>> 
>> Crash paths should not take that RTAS lock, it's a massive pain. I'm 
>> fixing it for machine check, for other crashes I think it can be removed 
>> too, it just needs to be unpicked. The good thing with crashing is that 
>> you can reasonably *know* that you're single threaded, so you can 
>> usually reason through situations like above.
>> 
>> > I noticed that when rtas is locked, irqs and preemption are also
>> > disabled.
>> > 
>> > Should the IPI send by crash be able to interrupt a thread with
>> > disabled irqs?
>> 
>> Yes. It's been a bit painful, but in the long term it means that a CPU 
>> which hangs with interrupts off can be debugged, and it means we can 
>> take it offline to crash without risking that it will be clobbering what 
>> we're doing.
>> 
>> Arguably what I should have done is try a regular IPI first, wait a few 
>> seconds, then NMI IPI.
>> 
>> A couple of problems with that. Firstly it probably avoids this issue 
>> you hit almost all the time, so it won't get fixed. So when we really 
>> need the NMI IPI in the field, it'll still be riddled with deadlocks.
>> 
>> Secondly, sending the IPI first in theory can be more intrusive to the 
>> state that we want to debug. It uses the currently running stack, paca 
>> save areas, ec. NMI IPI uses its own stack and save regions so it's a 
>> little more isolated. Maybe this is only a small advantage but I'd like 
>> to have it if we can.  
>
> I think the printk issue is solved (sent a patch on that), now what is
> missing is the rtas call spinlock.
>
> I noticed that rtas.lock is taken on machine_kexec_mask_interrupts(),
> which happen after crashing the other threads and getting into
> realmode. 
>
> The following rtas are called each IRQ with valid interrupt descriptor:
> ibm,int-off : Reset mask bit for that interrupt
> ibm,set_xive : Set XIVE priority to 0xff
>
> By what I could understand, these rtas calls happen to put the next
> kexec kernel (kdump kernel) in a safer environment, so I think it's not
> safe to just remove them.

Yes.

> (See commit d6c1a9081080c6c4658acf2a06d851feb2855933)

In hindsight the person who wrote that commit was being lazy. We
*should* have made the 2nd kernel robust against the IRQ state being
messed up.

> On the other hand, busting the rtas.lock could be dangerous, because
> it's code we can't control.
>
> According with LoPAR, for both of these rtas-calls, we have:
>
> For the PowerPC External Interrupt option: The call must be reentrant
> to the number of processors on the platform.
> For the PowerPC External Interrupt option: The argument call buffer for
> each simultaneous call must be physically unique.

Oh well spotted. Where is that in the doc?

> Which I think means this rtas-calls can be done simultaneously.

I think so too. I'll read PAPR in the morning and make sure.

> Would it mean that busting the rtas.lock for these calls would be safe?

What would be better is to make those specific calls not take the global
RTAS lock to begin with.

We should be able to just allocate the rtas_args on the stack, it's only
~80 odd bytes. And then we can use rtas_call_unlocked() which doesn't
take the global lock.

cheers


Re: [PATCH 17/28] mm: remove the prot argument from vm_map_ram

2020-04-08 Thread Peter Zijlstra
On Wed, Apr 08, 2020 at 01:59:15PM +0200, Christoph Hellwig wrote:
> This is always GFP_KERNEL - for long term mappings with other properties
> vmap should be used.

 PAGE_KERNEL != GFP_KERNEL :-)

> - return vm_map_ram(mock->pages, mock->npages, 0, PAGE_KERNEL);
> + return vm_map_ram(mock->pages, mock->npages, 0);


Re: [PATCH 17/28] mm: remove the prot argument from vm_map_ram

2020-04-08 Thread Christoph Hellwig
On Wed, Apr 08, 2020 at 02:21:04PM +0200, Peter Zijlstra wrote:
> On Wed, Apr 08, 2020 at 01:59:15PM +0200, Christoph Hellwig wrote:
> > This is always GFP_KERNEL - for long term mappings with other properties
> > vmap should be used.
> 
>  PAGE_KERNEL != GFP_KERNEL :-)

Yep.  The compiler complained about that a few times :)


Re: decruft the vmalloc API

2020-04-08 Thread Peter Zijlstra
On Wed, Apr 08, 2020 at 01:58:58PM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> Peter noticed that with some dumb luck you can toast the kernel address
> space with exported vmalloc symbols.
> 
> I used this as an opportunity to decruft the vmalloc.c API and make it
> much more systematic.  This also removes any chance to create vmalloc
> mappings outside the designated areas or using executable permissions
> from modules.  Besides that it removes more than 300 lines of code.
> 

Looks great, thanks for doing this!

Acked-by: Peter Zijlstra (Intel) 


Re: [PATCH 19/28] gpu/drm: remove the powerpc hack in drm_legacy_sg_alloc

2020-04-08 Thread Daniel Vetter
On Wed, Apr 08, 2020 at 01:59:17PM +0200, Christoph Hellwig wrote:
> If this code was broken for non-coherent caches a crude powerpc hack
> isn't going to help anyone else.  Remove the hack as it is the last
> user of __vmalloc passing a page protection flag other than PAGE_KERNEL.

Well Ben added this to make stuff work on ppc, ofc the home grown dma
layer in drm from back then isn't going to work in other places. I guess
should have at least an ack from him, in case anyone still cares about
this on ppc. Adding Ben to cc.
-Daniel

> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/gpu/drm/drm_scatter.c | 11 +--
>  1 file changed, 1 insertion(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_scatter.c b/drivers/gpu/drm/drm_scatter.c
> index ca520028b2cb..f4e6184d1877 100644
> --- a/drivers/gpu/drm/drm_scatter.c
> +++ b/drivers/gpu/drm/drm_scatter.c
> @@ -43,15 +43,6 @@
>  
>  #define DEBUG_SCATTER 0
>  
> -static inline void *drm_vmalloc_dma(unsigned long size)
> -{
> -#if defined(__powerpc__) && defined(CONFIG_NOT_COHERENT_CACHE)
> - return __vmalloc(size, GFP_KERNEL, pgprot_noncached_wc(PAGE_KERNEL));
> -#else
> - return vmalloc_32(size);
> -#endif
> -}
> -
>  static void drm_sg_cleanup(struct drm_sg_mem * entry)
>  {
>   struct page *page;
> @@ -126,7 +117,7 @@ int drm_legacy_sg_alloc(struct drm_device *dev, void 
> *data,
>   return -ENOMEM;
>   }
>  
> - entry->virtual = drm_vmalloc_dma(pages << PAGE_SHIFT);
> + entry->virtual = vmalloc_32(pages << PAGE_SHIFT);
>   if (!entry->virtual) {
>   kfree(entry->busaddr);
>   kfree(entry->pagelist);
> -- 
> 2.25.1
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH 18/28] mm: enforce that vmap can't map pages executable

2020-04-08 Thread Mark Rutland
On Wed, Apr 08, 2020 at 01:59:16PM +0200, Christoph Hellwig wrote:
> To help enforcing the W^X protection don't allow remapping existing
> pages as executable.
> 
> Based on patch from Peter Zijlstra .
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/x86/include/asm/pgtable_types.h | 6 ++
>  include/asm-generic/pgtable.h| 4 
>  mm/vmalloc.c | 2 +-
>  3 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/pgtable_types.h 
> b/arch/x86/include/asm/pgtable_types.h
> index 947867f112ea..2e7c442cc618 100644
> --- a/arch/x86/include/asm/pgtable_types.h
> +++ b/arch/x86/include/asm/pgtable_types.h
> @@ -282,6 +282,12 @@ typedef struct pgprot { pgprotval_t pgprot; } pgprot_t;
>  
>  typedef struct { pgdval_t pgd; } pgd_t;
>  
> +static inline pgprot_t pgprot_nx(pgprot_t prot)
> +{
> + return __pgprot(pgprot_val(prot) | _PAGE_NX);
> +}
> +#define pgprot_nx pgprot_nx
> +
>  #ifdef CONFIG_X86_PAE

I reckon for arm64 we can do similar in our :

#define pgprot_nx(pgprot_t prot) \
__pgprot_modify(prot, 0, PTE_PXN)

... matching the style of our existing pgprot_*() modifier helpers.

Mark.

>  
>  /*
> diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
> index 329b8c8ca703..8c5f9c29698b 100644
> --- a/include/asm-generic/pgtable.h
> +++ b/include/asm-generic/pgtable.h
> @@ -491,6 +491,10 @@ static inline int arch_unmap_one(struct mm_struct *mm,
>  #define flush_tlb_fix_spurious_fault(vma, address) flush_tlb_page(vma, 
> address)
>  #endif
>  
> +#ifndef pgprot_nx
> +#define pgprot_nx(prot)  (prot)
> +#endif
> +
>  #ifndef pgprot_noncached
>  #define pgprot_noncached(prot)   (prot)
>  #endif
> diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> index 7356b3f07bd8..334c75251ddb 100644
> --- a/mm/vmalloc.c
> +++ b/mm/vmalloc.c
> @@ -2390,7 +2390,7 @@ void *vmap(struct page **pages, unsigned int count,
>   if (!area)
>   return NULL;
>  
> - if (map_kernel_range((unsigned long)area->addr, size, prot,
> + if (map_kernel_range((unsigned long)area->addr, size, pgprot_nx(prot),
>   pages) < 0) {
>   vunmap(area->addr);
>   return NULL;
> -- 
> 2.25.1
> 


Re: [PATCH 02/28] staging: android: ion: use vmap instead of vm_map_ram

2020-04-08 Thread Hillf Danton


On Wed,  8 Apr 2020 13:59:00 +0200
> 
> vm_map_ram can keep mappings around after the vm_unmap_ram.  Using that
> with non-PAGE_KERNEL mappings can lead to all kinds of aliasing issues.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/staging/android/ion/ion_heap.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/staging/android/ion/ion_heap.c 
> b/drivers/staging/android/ion/ion_heap.c
> index 473b465724f1..a2d5c6df4b96 100644
> --- a/drivers/staging/android/ion/ion_heap.c
> +++ b/drivers/staging/android/ion/ion_heap.c
> @@ -99,12 +99,12 @@ int ion_heap_map_user(struct ion_heap *heap, struct 
> ion_buffer *buffer,
>  
>  static int ion_heap_clear_pages(struct page **pages, int num, pgprot_t 
> pgprot)
>  {
> - void *addr = vm_map_ram(pages, num, -1, pgprot);
> + void *addr = vmap(pages, num, VM_MAP);

A merge glitch?

void *vmap(struct page **pages, unsigned int count,
   unsigned long flags, pgprot_t prot)
>  
>   if (!addr)
>   return -ENOMEM;
>   memset(addr, 0, PAGE_SIZE * num);
> - vm_unmap_ram(addr, num);
> + vunmap(addr);
>  
>   return 0;
>  }
> -- 
> 2.25.1



[PATCH V2 3/5] selftests/powerpc: Add NX-GZIP engine compress testcase

2020-04-08 Thread Raphael Moreira Zinsly
Daniel Axtens  writes:
> Raphael Moreira Zinsly  writes:
...
>> +#define hwsync()({ asm volatile("hwsync" ::: "memory"); })
>
> This doesn't compile on the clang version I tried as it doesn't
> recognise 'hwsync'.  Does
> asm volatile("sync" ::: "memory");
> do the same thing?

Both hwsync and sync are extended mnemonics to 'sync 0'.
I just replaced hwsync for sync on this patch, but I'm
surprised that this is not recognized by clang.

--- >8 ---
Add a compression testcase for the powerpc NX-GZIP engine.

Signed-off-by: Bulent Abali 
Signed-off-by: Raphael Moreira Zinsly 
---
 .../selftests/powerpc/nx-gzip/Makefile|  21 +
 .../selftests/powerpc/nx-gzip/gzfht_test.c| 489 ++
 .../selftests/powerpc/nx-gzip/gzip_vas.c  | 259 ++
 3 files changed, 769 insertions(+)
 create mode 100644 tools/testing/selftests/powerpc/nx-gzip/Makefile
 create mode 100644 tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c
 create mode 100644 tools/testing/selftests/powerpc/nx-gzip/gzip_vas.c

diff --git a/tools/testing/selftests/powerpc/nx-gzip/Makefile 
b/tools/testing/selftests/powerpc/nx-gzip/Makefile
new file mode 100644
index ..ab903f63bbbd
--- /dev/null
+++ b/tools/testing/selftests/powerpc/nx-gzip/Makefile
@@ -0,0 +1,21 @@
+CC = gcc
+CFLAGS = -O3
+INC = ./inc
+SRC = gzfht_test.c
+OBJ = $(SRC:.c=.o)
+TESTS = gzfht_test
+EXTRA_SOURCES = gzip_vas.c
+
+all:   $(TESTS)
+
+$(OBJ): %.o: %.c
+   $(CC) $(CFLAGS) -I$(INC) -c $<
+
+$(TESTS): $(OBJ)
+   $(CC) $(CFLAGS) -I$(INC) -o $@ $@.o $(EXTRA_SOURCES)
+
+run_tests: $(TESTS)
+   ./gzfht_test gzip_vas.c
+
+clean:
+   rm -f $(TESTS) *.o *~ *.gz
diff --git a/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c 
b/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c
new file mode 100644
index ..7a21c25f5611
--- /dev/null
+++ b/tools/testing/selftests/powerpc/nx-gzip/gzfht_test.c
@@ -0,0 +1,489 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+/* P9 gzip sample code for demonstrating the P9 NX hardware interface.
+ * Not intended for productive uses or for performance or compression
+ * ratio measurements.  For simplicity of demonstration, this sample
+ * code compresses in to fixed Huffman blocks only (Deflate btype=1)
+ * and has very simple memory management.  Dynamic Huffman blocks
+ * (Deflate btype=2) are more involved as detailed in the user guide.
+ * Note also that /dev/crypto/gzip, VAS and skiboot support are
+ * required.
+ *
+ * Copyright 2020 IBM Corp.
+ *
+ * https://github.com/libnxz/power-gzip for zlib api and other utils
+ *
+ * Author: Bulent Abali 
+ *
+ * Definitions of acronyms used here. See
+ * P9 NX Gzip Accelerator User's Manual for details:
+ * https://github.com/libnxz/power-gzip/blob/develop/doc/power_nx_gzip_um.pdf
+ *
+ * adler/crc: 32 bit checksums appended to stream tail
+ * ce:   completion extension
+ * cpb:  coprocessor parameter block (metadata)
+ * crb:  coprocessor request block (command)
+ * csb:  coprocessor status block (status)
+ * dht:  dynamic huffman table
+ * dde:  data descriptor element (address, length)
+ * ddl:  list of ddes
+ * dh/fh:dynamic and fixed huffman types
+ * fc:   coprocessor function code
+ * histlen:  history/dictionary length
+ * history:  sliding window of up to 32KB of data
+ * lzcount:  Deflate LZ symbol counts
+ * rembytecnt: remaining byte count
+ * sfbt: source final block type; last block's type during decomp
+ * spbc: source processed byte count
+ * subc: source unprocessed bit count
+ * tebc: target ending bit count; valid bits in the last byte
+ * tpbc: target processed byte count
+ * vas:  virtual accelerator switch; the user mode interface
+ */
+
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "nxu.h"
+#include "nx.h"
+
+int nx_dbg;
+FILE *nx_gzip_log;
+void *nx_fault_storage_address;
+
+#define NX_MIN(X, Y) (((X) < (Y)) ? (X) : (Y))
+#define FNAME_MAX 1024
+#define FEXT ".nx.gz"
+
+/*
+ * LZ counts returned in the user supplied nx_gzip_crb_cpb_t structure.
+ */
+static int compress_fht_sample(char *src, uint32_t srclen, char *dst,
+   uint32_t dstlen, int with_count,
+   struct nx_gzip_crb_cpb_t *cmdp, void *handle)
+{
+   int cc;
+   uint32_t fc;
+
+   assert(!!cmdp);
+
+   put32(cmdp->crb, gzip_fc, 0);  /* clear */
+   fc = (with_count) ? GZIP_FC_COMPRESS_RESUME_FHT_COUNT :
+   GZIP_FC_COMPRESS_RESUME_FHT;
+   putnn(cmdp->crb, gzip_fc, fc);
+   putnn(cmdp->cpb, in_histlen, 0); /* resuming with no history */
+   memset((void *) &cmdp->crb.csb, 0, sizeof(cmdp->crb.csb));
+
+   /* Section 6.6 programming notes; spbc may be in two different
+* places depending on FC.
+*/
+   if (!with_count)
+ 

Re: [PATCH 0/2] powerpc: Remove support for ppc405/440 Xilinx platforms

2020-04-08 Thread Arnd Bergmann
+On Wed, Apr 8, 2020 at 2:04 PM Michael Ellerman  wrote:
> Benjamin Herrenschmidt  writes:
> > On Fri, 2020-04-03 at 15:59 +1100, Michael Ellerman wrote:
> >> Benjamin Herrenschmidt  writes:
> > IBM still put 40x cores inside POWER chips no ?
>
> Oh yeah that's true. I guess most folks don't know that, or that they
> run RHEL on them.

Is there a reason for not having those dts files in mainline then?
If nothing else, it would document what machines are still being
used with future kernels.

Also, if that's the only 405 based product that is still relevant with
a 5.7+ kernel, it may be useful to know at which point they
move to a 476 core and stop updating kernels on the old ones.

  Arnd


Re: [PATCH 02/28] staging: android: ion: use vmap instead of vm_map_ram

2020-04-08 Thread Greg KH
On Wed, Apr 08, 2020 at 01:59:00PM +0200, Christoph Hellwig wrote:
> vm_map_ram can keep mappings around after the vm_unmap_ram.  Using that
> with non-PAGE_KERNEL mappings can lead to all kinds of aliasing issues.
> 
> Signed-off-by: Christoph Hellwig 

Acked-by: Greg Kroah-Hartman 


Re: [PATCH 27/28] s390: use __vmalloc_node in alloc_vm_stack

2020-04-08 Thread Christian Borntraeger
On 08.04.20 13:59, Christoph Hellwig wrote:
> alloc_vm_stack can use a slightly higher level vmalloc function.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/powerpc/kernel/irq.c | 5 ++---

wrong subject (power vs s390)

>  1 file changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
> index a25ed47087ee..4518fb1d6bf4 100644
> --- a/arch/powerpc/kernel/irq.c
> +++ b/arch/powerpc/kernel/irq.c
> @@ -735,9 +735,8 @@ void do_IRQ(struct pt_regs *regs)
>  
>  static void *__init alloc_vm_stack(void)
>  {
> - return __vmalloc_node_range(THREAD_SIZE, THREAD_ALIGN, VMALLOC_START,
> - VMALLOC_END, THREADINFO_GFP, PAGE_KERNEL,
> -  0, NUMA_NO_NODE, (void*)_RET_IP_);
> + return __vmalloc_node(THREAD_SIZE, THREAD_ALIGN, THREADINFO_GFP,
> +   NUMA_NO_NODE, (void *)_RET_IP_);
>  }
>  
>  static void __init vmap_irqstack_init(void)
> 



Re: [PATCH 28/28] s390: use __vmalloc_node in stack_alloc

2020-04-08 Thread Christian Borntraeger



On 08.04.20 13:59, Christoph Hellwig wrote:
> stack_alloc can use a slightly higher level vmalloc function.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/s390/kernel/setup.c | 9 +++--
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/s390/kernel/setup.c b/arch/s390/kernel/setup.c
> index 36445dd40fdb..0f0b140b5558 100644
> --- a/arch/s390/kernel/setup.c
> +++ b/arch/s390/kernel/setup.c
> @@ -305,12 +305,9 @@ void *restart_stack __section(.data);
>  unsigned long stack_alloc(void)
>  {
>  #ifdef CONFIG_VMAP_STACK
> - return (unsigned long)
> - __vmalloc_node_range(THREAD_SIZE, THREAD_SIZE,
> -  VMALLOC_START, VMALLOC_END,
> -  THREADINFO_GFP,
> -  PAGE_KERNEL, 0, NUMA_NO_NODE,
> -  __builtin_return_address(0));
> + return (unsigned long)__vmalloc_node(THREAD_SIZE, THREAD_SIZE,
> + THREADINFO_GFP, NUMA_NO_NODE,
> + __builtin_return_address(0));

Looks sane.

Acked-by: Christian Borntraeger 


>  #else
>   return __get_free_pages(GFP_KERNEL, THREAD_SIZE_ORDER);
>  #endif
> 



usb: gadget: fsl_udc_core: Checking for a failed platform_get_irq() call in fsl_udc_probe()

2020-04-08 Thread Markus Elfring
Hello,

I have taken another look at the implementation of the function “fsl_udc_probe”.
A software analysis approach points the following source code out for
further development considerations.
https://elixir.bootlin.com/linux/v5.6.2/source/drivers/usb/gadget/udc/fsl_udc_core.c#L2443
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/gadget/udc/fsl_udc_core.c?id=f5e94d10e4c468357019e5c28d48499f677b284f#n2442

udc_controller->irq = platform_get_irq(pdev, 0);
if (!udc_controller->irq) {
ret = -ENODEV;
goto err_iounmap;
}


The software documentation is providing the following information
for the used programming interface.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/platform.c?id=f5e94d10e4c468357019e5c28d48499f677b284f#n221
https://elixir.bootlin.com/linux/v5.6.2/source/drivers/base/platform.c#L202

“…
 * Return: IRQ number on success, negative error number on failure.
…”

Would you like to reconsider the shown condition check?

Regards,
Markus


[Bug 207129] PowerMac G4 DP (5.6.2 debug kernel + inline KASAN) freezes shortly after booting with "do_IRQ: stack overflow: 1760"

2020-04-08 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=207129

--- Comment #5 from Christophe Leroy (christophe.le...@c-s.fr) ---
Ok, so as a summary:
- With CONFIG_THREAD_SHIFT = 13 and CONFIG_DEBUG_STACKOVERFLOW, the system gets
stuck
- With CONFIG_THREAD_SHIFT = 13 and without CONFIG_DEBUG_STACKOVERFLOW, stack
overflow is not really detected until it gets into kernel text !!!
- With CONFIG_THREAD_SHIFT = 14 it runs fine
- With CONFIG_VMAP_STACK, the automatic restart doesn't work
- Without CONFIG_VMAP_STACK, the automatic restart works

So I'll send a patch to set CONFIG_THREAD_SHIFT to 14 when CONFIG_KASAN is
selected. x86 and arm64 already do that.

And I'll try to investigate the other points when I have time.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH 09/28] mm: rename CONFIG_PGTABLE_MAPPING to CONFIG_ZSMALLOC_PGTABLE_MAPPING

2020-04-08 Thread Randy Dunlap
On 4/8/20 4:59 AM, Christoph Hellwig wrote:
> Rename the Kconfig variable to clarify the scope.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/arm/configs/omap2plus_defconfig | 2 +-
>  include/linux/zsmalloc.h | 2 +-
>  mm/Kconfig   | 2 +-
>  mm/zsmalloc.c| 8 
>  4 files changed, 7 insertions(+), 7 deletions(-)
> 

Looks good. Thanks.

Acked-by: Randy Dunlap 


-- 
~Randy



Re: [PATCH 10/28] mm: only allow page table mappings for built-in zsmalloc

2020-04-08 Thread Randy Dunlap
Hi,

On 4/8/20 4:59 AM, Christoph Hellwig wrote:
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 36949a9425b8..614cc786b519 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -702,7 +702,7 @@ config ZSMALLOC
>  
>  config ZSMALLOC_PGTABLE_MAPPING
>   bool "Use page table mapping to access object in zsmalloc"
> - depends on ZSMALLOC
> + depends on ZSMALLOC=y

It's a bool so this shouldn't matter... not needed.

>   help
> By default, zsmalloc uses a copy-based object mapping method to
> access allocations that span two pages. However, if a particular


-- 
~Randy



Re: [PATCH 10/28] mm: only allow page table mappings for built-in zsmalloc

2020-04-08 Thread Peter Zijlstra
On Wed, Apr 08, 2020 at 08:01:00AM -0700, Randy Dunlap wrote:
> Hi,
> 
> On 4/8/20 4:59 AM, Christoph Hellwig wrote:
> > diff --git a/mm/Kconfig b/mm/Kconfig
> > index 36949a9425b8..614cc786b519 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -702,7 +702,7 @@ config ZSMALLOC
> >  
> >  config ZSMALLOC_PGTABLE_MAPPING
> > bool "Use page table mapping to access object in zsmalloc"
> > -   depends on ZSMALLOC
> > +   depends on ZSMALLOC=y
> 
> It's a bool so this shouldn't matter... not needed.

My mm/Kconfig has:

config ZSMALLOC
tristate "Memory allocator for compressed pages"
depends on MMU

which I think means it can be modular, no?


Re: [PATCH 10/28] mm: only allow page table mappings for built-in zsmalloc

2020-04-08 Thread Matthew Wilcox
On Wed, Apr 08, 2020 at 05:12:03PM +0200, Peter Zijlstra wrote:
> On Wed, Apr 08, 2020 at 08:01:00AM -0700, Randy Dunlap wrote:
> > Hi,
> > 
> > On 4/8/20 4:59 AM, Christoph Hellwig wrote:
> > > diff --git a/mm/Kconfig b/mm/Kconfig
> > > index 36949a9425b8..614cc786b519 100644
> > > --- a/mm/Kconfig
> > > +++ b/mm/Kconfig
> > > @@ -702,7 +702,7 @@ config ZSMALLOC
> > >  
> > >  config ZSMALLOC_PGTABLE_MAPPING
> > >   bool "Use page table mapping to access object in zsmalloc"
> > > - depends on ZSMALLOC
> > > + depends on ZSMALLOC=y
> > 
> > It's a bool so this shouldn't matter... not needed.
> 
> My mm/Kconfig has:
> 
> config ZSMALLOC
>   tristate "Memory allocator for compressed pages"
>   depends on MMU
> 
> which I think means it can be modular, no?

Randy means that ZSMALLOC_PGTABLE_MAPPING is a bool, so I think hch's patch
is wrong ... if ZSMALLOC is 'm' then ZSMALLOC_PGTABLE_MAPPING would become
'n' instead of 'y'.


Re: [Bug 206203] kmemleak reports various leaks in drivers/of/unittest.c

2020-04-08 Thread Frank Rowand
Hi Michael,

On 4/7/20 10:13 PM, Michael Ellerman wrote:
> bugzilla-dae...@bugzilla.kernel.org writes:
>> https://bugzilla.kernel.org/show_bug.cgi?id=206203
>>
>> Erhard F. (erhar...@mailbox.org) changed:
>>
>>What|Removed |Added
>> 
>>  Attachment #286801|0   |1
>> is obsolete||
>>
>> --- Comment #10 from Erhard F. (erhar...@mailbox.org) ---
>> Created attachment 288189
>>   --> https://bugzilla.kernel.org/attachment.cgi?id=288189&action=edit
>> kmemleak output (kernel 5.6.2, Talos II)
> 
> These are all in or triggered by the of unittest code AFAICS.
> Content of the log reproduced below.
> 
> Frank/Rob, are these memory leaks expected?

Thanks for the report.  I'll look at each one.

-Frank


> 
> cheers
> 
> 
> unreferenced object 0xc007eb89ca58 (size 192):
>   comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s)
>   hex dump (first 32 bytes):
> c0 00 00 00 00 d9 21 38 00 00 00 00 00 00 00 00  ..!8
> c0 00 00 07 ec 97 80 08 00 00 00 00 00 00 00 00  
>   backtrace:
> [<07b50c76>] .__of_node_dup+0x38/0x1c0
> [] .of_unittest_changeset+0x13c/0xa20
> [<925a8013>] .of_unittest+0x1ba0/0x3778
> [] .do_one_initcall+0x7c/0x420
> [] .kernel_init_freeable+0x318/0x3d8
> [<01b957ee>] .kernel_init+0x14/0x168
> [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68
> unreferenced object 0xc007ec978008 (size 8):
>   comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s)
>   hex dump (first 8 bytes):
> 6e 31 00 6b 6b 6b 6b a5  n1..
>   backtrace:
> [] .kstrdup+0x44/0xb0
> [] .__of_node_dup+0x50/0x1c0
> [] .of_unittest_changeset+0x13c/0xa20
> [<925a8013>] .of_unittest+0x1ba0/0x3778
> [] .do_one_initcall+0x7c/0x420
> [] .kernel_init_freeable+0x318/0x3d8
> [<01b957ee>] .kernel_init+0x14/0x168
> [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68
> unreferenced object 0xc007eb89e318 (size 192):
>   comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s)
>   hex dump (first 32 bytes):
> c0 00 00 00 00 d9 21 38 00 00 00 00 00 00 00 00  ..!8
> c0 00 00 07 ec 97 ab 08 00 00 00 00 00 00 00 00  
>   backtrace:
> [<07b50c76>] .__of_node_dup+0x38/0x1c0
> [<881dc9c4>] .of_unittest_changeset+0x194/0xa20
> [<925a8013>] .of_unittest+0x1ba0/0x3778
> [] .do_one_initcall+0x7c/0x420
> [] .kernel_init_freeable+0x318/0x3d8
> [<01b957ee>] .kernel_init+0x14/0x168
> [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68
> unreferenced object 0xc007ec97ab08 (size 8):
>   comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s)
>   hex dump (first 8 bytes):
> 6e 32 00 6b 6b 6b 6b a5  n2..
>   backtrace:
> [] .kstrdup+0x44/0xb0
> [] .__of_node_dup+0x50/0x1c0
> [<881dc9c4>] .of_unittest_changeset+0x194/0xa20
> [<925a8013>] .of_unittest+0x1ba0/0x3778
> [] .do_one_initcall+0x7c/0x420
> [] .kernel_init_freeable+0x318/0x3d8
> [<01b957ee>] .kernel_init+0x14/0x168
> [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68
> unreferenced object 0xc007eb89e528 (size 192):
>   comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s)
>   hex dump (first 32 bytes):
> c0 00 00 07 ec 97 bd d8 00 00 00 00 00 00 00 00  
> c0 00 00 07 ec 97 b3 18 00 00 00 00 00 00 00 00  
>   backtrace:
> [<07b50c76>] .__of_node_dup+0x38/0x1c0
> [] .of_unittest_changeset+0x1ec/0xa20
> [<925a8013>] .of_unittest+0x1ba0/0x3778
> [] .do_one_initcall+0x7c/0x420
> [] .kernel_init_freeable+0x318/0x3d8
> [<01b957ee>] .kernel_init+0x14/0x168
> [<1fe347b5>] .ret_from_kernel_thread+0x58/0x68
> unreferenced object 0xc007ec97b318 (size 8):
>   comm "swapper/0", pid 1, jiffies 4294878935 (age 824.747s)
>   hex dump (first 8 bytes):
> 6e 32 31 00 6b 6b 6b a5  n21.kkk.
>   backtrace:
> [] .kstrdup+0x44/0xb0
> [] .__of_node_dup+0x50/0x1c0
> [] .of_unittest_changeset+0x1ec/0xa20
> [<925a8013>] .of_unittest+0x1ba0/0x3778
> [] .do_one_initcall+0x7c/0x420
> [] .kernel_init_freeable+0x318/0x3d8
> [<01b957ee>] .kernel_init+0x14/0x168
> [<1fe347b5>] .ret_from_kernel_thread+0x58/0x

Re: [PATCH 18/28] mm: enforce that vmap can't map pages executable

2020-04-08 Thread Christoph Hellwig
On Wed, Apr 08, 2020 at 01:38:36PM +0100, Mark Rutland wrote:
> > +static inline pgprot_t pgprot_nx(pgprot_t prot)
> > +{
> > +   return __pgprot(pgprot_val(prot) | _PAGE_NX);
> > +}
> > +#define pgprot_nx pgprot_nx
> > +
> >  #ifdef CONFIG_X86_PAE
> 
> I reckon for arm64 we can do similar in our :
> 
> #define pgprot_nx(pgprot_t prot) \
>   __pgprot_modify(prot, 0, PTE_PXN)
> 
> ... matching the style of our existing pgprot_*() modifier helpers.

I've added that for the next version with attribution to you.


Re: [PATCH 10/28] mm: only allow page table mappings for built-in zsmalloc

2020-04-08 Thread Christoph Hellwig
On Wed, Apr 08, 2020 at 08:15:19AM -0700, Matthew Wilcox wrote:
> > > >  config ZSMALLOC_PGTABLE_MAPPING
> > > > bool "Use page table mapping to access object in zsmalloc"
> > > > -   depends on ZSMALLOC
> > > > +   depends on ZSMALLOC=y
> > > 
> > > It's a bool so this shouldn't matter... not needed.
> > 
> > My mm/Kconfig has:
> > 
> > config ZSMALLOC
> > tristate "Memory allocator for compressed pages"
> > depends on MMU
> > 
> > which I think means it can be modular, no?
> 
> Randy means that ZSMALLOC_PGTABLE_MAPPING is a bool, so I think hch's patch
> is wrong ... if ZSMALLOC is 'm' then ZSMALLOC_PGTABLE_MAPPING would become
> 'n' instead of 'y'.

In Linus' tree you can select PGTABLE_MAPPING=y with ZSMALLOC=m,
and that fits my understanding of the kbuild language.  With this
patch I can't anymore.


Re: [PATCH 10/28] mm: only allow page table mappings for built-in zsmalloc

2020-04-08 Thread Randy Dunlap
On 4/8/20 8:15 AM, Matthew Wilcox wrote:
> On Wed, Apr 08, 2020 at 05:12:03PM +0200, Peter Zijlstra wrote:
>> On Wed, Apr 08, 2020 at 08:01:00AM -0700, Randy Dunlap wrote:
>>> Hi,
>>>
>>> On 4/8/20 4:59 AM, Christoph Hellwig wrote:
 diff --git a/mm/Kconfig b/mm/Kconfig
 index 36949a9425b8..614cc786b519 100644
 --- a/mm/Kconfig
 +++ b/mm/Kconfig
 @@ -702,7 +702,7 @@ config ZSMALLOC
  
  config ZSMALLOC_PGTABLE_MAPPING
bool "Use page table mapping to access object in zsmalloc"
 -  depends on ZSMALLOC
 +  depends on ZSMALLOC=y
>>>
>>> It's a bool so this shouldn't matter... not needed.
>>
>> My mm/Kconfig has:
>>
>> config ZSMALLOC
>>  tristate "Memory allocator for compressed pages"
>>  depends on MMU
>>
>> which I think means it can be modular, no?

ack. I misread it.

> Randy means that ZSMALLOC_PGTABLE_MAPPING is a bool, so I think hch's patch
> is wrong ... if ZSMALLOC is 'm' then ZSMALLOC_PGTABLE_MAPPING would become
> 'n' instead of 'y'.

sigh, I wish that I had meant that. :)

thanks.

-- 
~Randy



Re: [PATCH 10/28] mm: only allow page table mappings for built-in zsmalloc

2020-04-08 Thread Randy Dunlap
On 4/8/20 8:36 AM, Christoph Hellwig wrote:
> On Wed, Apr 08, 2020 at 08:15:19AM -0700, Matthew Wilcox wrote:
>  config ZSMALLOC_PGTABLE_MAPPING
>   bool "Use page table mapping to access object in zsmalloc"
> - depends on ZSMALLOC
> + depends on ZSMALLOC=y

 It's a bool so this shouldn't matter... not needed.
>>>
>>> My mm/Kconfig has:
>>>
>>> config ZSMALLOC
>>> tristate "Memory allocator for compressed pages"
>>> depends on MMU
>>>
>>> which I think means it can be modular, no?
>>
>> Randy means that ZSMALLOC_PGTABLE_MAPPING is a bool, so I think hch's patch
>> is wrong ... if ZSMALLOC is 'm' then ZSMALLOC_PGTABLE_MAPPING would become
>> 'n' instead of 'y'.
> 
> In Linus' tree you can select PGTABLE_MAPPING=y with ZSMALLOC=m,
> and that fits my understanding of the kbuild language.  With this
> patch I can't anymore.
> 

Makes sense. thanks.

-- 
~Randy



Re: [PATCH 02/28] staging: android: ion: use vmap instead of vm_map_ram

2020-04-08 Thread Christoph Hellwig
On Wed, Apr 08, 2020 at 08:48:33PM +0800, Hillf Danton wrote:
> > -   void *addr = vm_map_ram(pages, num, -1, pgprot);
> > +   void *addr = vmap(pages, num, VM_MAP);
> 
> A merge glitch?
> 
> void *vmap(struct page **pages, unsigned int count,
>  unsigned long flags, pgprot_t prot)

Yes, thanks for the headsup, you were as fast as the build bot :)

Fixed now.


[PATCH 00/35] Documentation fixes for Kernel 5.8

2020-04-08 Thread Mauro Carvalho Chehab
Hi Jon,

I have a large list of patches this time for the Documentation/. So, I'm
starting sending them a little earier. Yet, those are meant to be applied
after the end of the merge window. They're based on today's linux-next,
with has only 49 patches pending to be applied upstream touching
Documentation/, so I don't expect much conflicts if applied early at
-rc cycle.

Most of the patches here were already submitted, but weren't
merged yet at next. So, it seems that nobody picked them yet.

In any case, most of those patches here are independent from 
the others.

The number of doc build warnings have been rising with time.
The main goal with this series is to get rid of most Sphinx warnings
and other errors.

Patches 1 to 5: fix broken references detected by this tool:

./scripts/documentation-file-ref-check

The other patches fix other random errors due to tags being
mis-interpreted or mis-used.

You should notice that several patches touch kernel-doc scripts.
IMHO, some of the warnings are actually due to kernel-doc being
too pedantic. So, I ended by improving some things at the toolset,
in order to make it smarter. That's the case of those patches:

docs: scripts/kernel-doc: accept blank lines on parameter description
scripts: kernel-doc: accept negation like !@var
scripts: kernel-doc: proper handle @foo->bar()

The last 4 patches address problems with PDF building.

The first one address a conflict that will rise during the merge
window: Documentation/media will be removed. Instead of
just drop it from the list of PDF documents, I opted to drop the
entire list, as conf.py will auto-generate from the sources:

docs: LaTeX/PDF: drop list of documents

Also, right now, PDF output is broken due to a namespace conflict 
at I2c (two pdf outputs there will have the same name).

docs: i2c: rename i2c.svg to i2c_bus.svg

The third PDF patch is not really a fix, but it helps a lot to identify
if the build succeeded or not, by placing the final PDF output on
a separate dir:

docs: Makefile: place final pdf docs on a separate dir

Finally, the last one solves a bug since the first supported Sphinx
version, with also impacts PDF output: basically while nested tables
are valid with ReST notation, the toolset only started supporting
it on PDF output since version 2.4:

docs: update recommended Sphinx version to 2.4.4

PS.: Due to the large number of C/C, I opted to keep a smaller
set of C/C at this first e-mail (only e-mails with "L:" tag from
MAINTAINERS file).

Mauro Carvalho Chehab (35):
  MAINTAINERS: dt: update display/allwinner file entry
  docs: dt: fix broken reference to phy-cadence-torrent.yaml
  docs: fix broken references to text files
  docs: fix broken references for ReST files that moved around
  docs: filesystems: fix renamed references
  docs: amu: supress some Sphinx warnings
  docs: arm64: booting.rst: get rid of some warnings
  docs: pci: boot-interrupts.rst: improve html output
  futex: get rid of a kernel-docs build warning
  firewire: firewire-cdev.hL get rid of a docs warning
  scripts: kernel-doc: proper handle @foo->bar()
  lib: bitmap.c: get rid of some doc warnings
  ata: libata-core: fix a doc warning
  fs: inode.c: get rid of docs warnings
  docs: ras: get rid of some warnings
  docs: ras: don't need to repeat twice the same thing
  docs: watch_queue.rst: supress some Sphinx warnings
  scripts: kernel-doc: accept negation like !@var
  docs: infiniband: verbs.c: fix some documentation warnings
  docs: scripts/kernel-doc: accept blank lines on parameter description
  docs: spi: spi.h: fix a doc building warning
  docs: drivers: fix some warnings at base/platform.c when building docs
  docs: fusion: mptbase.c: get rid of a doc build warning
  docs: mm: slab.h: fix a broken cross-reference
  docs mm: userfaultfd.rst: use ``foo`` for literals
  docs: mm: userfaultfd.rst: use a cross-reference for a section
  docs: vm: index.rst: add an orphan doc to the building system
  docs: dt: qcom,dwc3.txt: fix cross-reference for a converted file
  MAINTAINERS: dt: fix pointers for ARM Integrator, Versatile and
RealView
  docs: dt: fix a broken reference for a file converted to json
  powerpc: docs: cxl.rst: mark two section titles as such
  docs: LaTeX/PDF: drop list of documents
  docs: i2c: rename i2c.svg to i2c_bus.svg
  docs: Makefile: place final pdf docs on a separate dir
  docs: update recommended Sphinx version to 2.4.4

 Documentation/ABI/stable/sysfs-devices-node   |   2 +-
 Documentation/ABI/testing/procfs-smaps_rollup |   2 +-
 Documentation/Makefile|   6 +-
 Documentation/PCI/boot-interrupts.rst |  34 +--
 Documentation/admin-guide/cpu-load.rst|   2 +-
 Documentation/admin-guide/mm/userfaultfd.rst  | 209 +-
 Documentation/admin-guide/nfs/nfsroot.rst |   2 +-
 Documentation/admin-guide/ras.rst |  18 +-
 Documentation/arm64/amu.rst   |

[PATCH 31/35] powerpc: docs: cxl.rst: mark two section titles as such

2020-04-08 Thread Mauro Carvalho Chehab
The User API chapter contains two sub-chapters. Mark them as
such.

Signed-off-by: Mauro Carvalho Chehab 
---
 Documentation/powerpc/cxl.rst | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/powerpc/cxl.rst b/Documentation/powerpc/cxl.rst
index 920546d81326..d2d77057610e 100644
--- a/Documentation/powerpc/cxl.rst
+++ b/Documentation/powerpc/cxl.rst
@@ -133,6 +133,7 @@ User API
 
 
 1. AFU character devices
+
 
 For AFUs operating in AFU directed mode, two character device
 files will be created. /dev/cxl/afu0.0m will correspond to a
@@ -395,6 +396,7 @@ read
 
 
 2. Card character device (powerVM guest only)
+^
 
 In a powerVM guest, an extra character device is created for the
 card. The device is only used to write (flash) a new image on the
-- 
2.25.2



[PATCH 03/35] docs: fix broken references to text files

2020-04-08 Thread Mauro Carvalho Chehab
Several references got broken due to txt to ReST conversion.

Several of them can be automatically fixed with:

scripts/documentation-file-ref-check --fix

Reviewed-by: Mathieu Poirier  # 
hwtracing/coresight/Kconfig
Signed-off-by: Mauro Carvalho Chehab 
---
 Documentation/memory-barriers.txt|  2 +-
 Documentation/process/submit-checklist.rst   |  2 +-
 .../translations/it_IT/process/submit-checklist.rst  |  2 +-
 Documentation/translations/ko_KR/memory-barriers.txt |  2 +-
 .../translations/zh_CN/filesystems/sysfs.txt |  2 +-
 .../translations/zh_CN/process/submit-checklist.rst  |  2 +-
 Documentation/virt/kvm/arm/pvtime.rst|  2 +-
 Documentation/virt/kvm/devices/vcpu.rst  |  2 +-
 Documentation/virt/kvm/hypercalls.rst|  4 ++--
 arch/powerpc/include/uapi/asm/kvm_para.h |  2 +-
 drivers/gpu/drm/Kconfig  |  2 +-
 drivers/gpu/drm/drm_ioctl.c  |  2 +-
 drivers/hwtracing/coresight/Kconfig  |  2 +-
 fs/fat/Kconfig   |  8 
 fs/fuse/Kconfig  |  2 +-
 fs/fuse/dev.c|  2 +-
 fs/overlayfs/Kconfig |  6 +++---
 include/linux/mm.h   |  4 ++--
 include/uapi/linux/ethtool_netlink.h |  2 +-
 include/uapi/rdma/rdma_user_ioctl_cmds.h |  2 +-
 mm/gup.c | 12 ++--
 virt/kvm/arm/vgic/vgic-mmio-v3.c |  2 +-
 virt/kvm/arm/vgic/vgic.h |  4 ++--
 23 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/Documentation/memory-barriers.txt 
b/Documentation/memory-barriers.txt
index e1c355e84edd..eaabc3134294 100644
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -620,7 +620,7 @@ because the CPUs that the Linux kernel supports don't do 
writes
 until they are certain (1) that the write will actually happen, (2)
 of the location of the write, and (3) of the value to be written.
 But please carefully read the "CONTROL DEPENDENCIES" section and the
-Documentation/RCU/rcu_dereference.txt file:  The compiler can and does
+Documentation/RCU/rcu_dereference.rst file:  The compiler can and does
 break dependencies in a great many highly creative ways.
 
CPU 1 CPU 2
diff --git a/Documentation/process/submit-checklist.rst 
b/Documentation/process/submit-checklist.rst
index 8e56337d422d..3f8e9d5d95c2 100644
--- a/Documentation/process/submit-checklist.rst
+++ b/Documentation/process/submit-checklist.rst
@@ -107,7 +107,7 @@ and elsewhere regarding submitting Linux kernel patches.
 and why.
 
 26) If any ioctl's are added by the patch, then also update
-``Documentation/ioctl/ioctl-number.rst``.
+``Documentation/userspace-api/ioctl/ioctl-number.rst``.
 
 27) If your modified source code depends on or uses any of the kernel
 APIs or features that are related to the following ``Kconfig`` symbols,
diff --git a/Documentation/translations/it_IT/process/submit-checklist.rst 
b/Documentation/translations/it_IT/process/submit-checklist.rst
index 995ee69fab11..3e575502690f 100644
--- a/Documentation/translations/it_IT/process/submit-checklist.rst
+++ b/Documentation/translations/it_IT/process/submit-checklist.rst
@@ -117,7 +117,7 @@ sottomissione delle patch, in particolare
 sorgenti che ne spieghi la logica: cosa fanno e perché.
 
 25) Se la patch aggiunge nuove chiamate ioctl, allora aggiornate
-``Documentation/ioctl/ioctl-number.rst``.
+``Documentation/userspace-api/ioctl/ioctl-number.rst``.
 
 26) Se il codice che avete modificato dipende o usa una qualsiasi interfaccia o
 funzionalità del kernel che è associata a uno dei seguenti simboli
diff --git a/Documentation/translations/ko_KR/memory-barriers.txt 
b/Documentation/translations/ko_KR/memory-barriers.txt
index 2e831ece6e26..e50fe6541335 100644
--- a/Documentation/translations/ko_KR/memory-barriers.txt
+++ b/Documentation/translations/ko_KR/memory-barriers.txt
@@ -641,7 +641,7 @@ P 는 짝수 번호 캐시 라인에 저장되어 있고, 변수 B 는 홀수 
 리눅스 커널이 지원하는 CPU 들은 (1) 쓰기가 정말로 일어날지, (2) 쓰기가 어디에
 이루어질지, 그리고 (3) 쓰여질 값을 확실히 알기 전까지는 쓰기를 수행하지 않기
 때문입니다.  하지만 "컨트롤 의존성" 섹션과
-Documentation/RCU/rcu_dereference.txt 파일을 주의 깊게 읽어 주시기 바랍니다:
+Documentation/RCU/rcu_dereference.rst 파일을 주의 깊게 읽어 주시기 바랍니다:
 컴파일러는 매우 창의적인 많은 방법으로 종속성을 깰 수 있습니다.
 
CPU 1 CPU 2
diff --git a/Documentation/translations/zh_CN/filesystems/sysfs.txt 
b/Documentation/translations/zh_CN/filesystems/sysfs.txt
index ee1f37da5b23..a15c3ebdfa82 100644
--- a/Documentation/translations/zh_CN/filesystems/sysfs.txt
+++ b/Documentation/translations/zh_CN/filesystems/sysfs.txt
@@ -281,7 +281,7 @@ drivers/ 包含了每个已为特定总线上的设备而挂载的驱动程序
 假定驱动没有跨越多个总线类型)。
 
 fs/ 包含了一个为文件系统设立的目录。现在每个想要导出属性的文件系统必须
-在 fs/ 

[PATCH 05/35] docs: filesystems: fix renamed references

2020-04-08 Thread Mauro Carvalho Chehab
Some filesystem references got broken by a previous patch
series I submitted. Address those.

Signed-off-by: Mauro Carvalho Chehab 
---
 Documentation/ABI/stable/sysfs-devices-node | 2 +-
 Documentation/ABI/testing/procfs-smaps_rollup   | 2 +-
 Documentation/admin-guide/cpu-load.rst  | 2 +-
 Documentation/admin-guide/nfs/nfsroot.rst   | 2 +-
 Documentation/driver-api/driver-model/device.rst| 4 ++--
 Documentation/driver-api/driver-model/overview.rst  | 2 +-
 Documentation/filesystems/dax.txt   | 2 +-
 Documentation/filesystems/dnotify.txt   | 2 +-
 Documentation/filesystems/ramfs-rootfs-initramfs.rst| 2 +-
 Documentation/filesystems/sysfs.rst | 2 +-
 Documentation/powerpc/firmware-assisted-dump.rst| 2 +-
 Documentation/process/adding-syscalls.rst   | 2 +-
 .../translations/it_IT/process/adding-syscalls.rst  | 2 +-
 Documentation/translations/zh_CN/filesystems/sysfs.txt  | 6 +++---
 drivers/base/core.c | 2 +-
 drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h | 2 +-
 fs/Kconfig  | 2 +-
 fs/Kconfig.binfmt   | 2 +-
 fs/adfs/Kconfig | 2 +-
 fs/affs/Kconfig | 2 +-
 fs/afs/Kconfig  | 6 +++---
 fs/bfs/Kconfig  | 2 +-
 fs/cramfs/Kconfig   | 2 +-
 fs/ecryptfs/Kconfig | 2 +-
 fs/hfs/Kconfig  | 2 +-
 fs/hpfs/Kconfig | 2 +-
 fs/isofs/Kconfig| 2 +-
 fs/namespace.c  | 2 +-
 fs/notify/inotify/Kconfig   | 2 +-
 fs/ntfs/Kconfig | 2 +-
 fs/ocfs2/Kconfig| 2 +-
 fs/proc/Kconfig | 4 ++--
 fs/romfs/Kconfig| 2 +-
 fs/sysfs/dir.c  | 2 +-
 fs/sysfs/file.c | 2 +-
 fs/sysfs/mount.c| 2 +-
 fs/sysfs/symlink.c  | 2 +-
 fs/sysv/Kconfig | 2 +-
 fs/udf/Kconfig  | 2 +-
 include/linux/kobject.h | 2 +-
 include/linux/kobject_ns.h  | 2 +-
 include/linux/relay.h   | 2 +-
 include/linux/sysfs.h   | 2 +-
 kernel/relay.c  | 2 +-
 lib/kobject.c   | 4 ++--
 45 files changed, 52 insertions(+), 52 deletions(-)

diff --git a/Documentation/ABI/stable/sysfs-devices-node 
b/Documentation/ABI/stable/sysfs-devices-node
index df8413cf1468..484fc04bcc25 100644
--- a/Documentation/ABI/stable/sysfs-devices-node
+++ b/Documentation/ABI/stable/sysfs-devices-node
@@ -54,7 +54,7 @@ Date: October 2002
 Contact:   Linux Memory Management list 
 Description:
Provides information about the node's distribution and memory
-   utilization. Similar to /proc/meminfo, see 
Documentation/filesystems/proc.txt
+   utilization. Similar to /proc/meminfo, see 
Documentation/filesystems/proc.rst
 
 What:  /sys/devices/system/node/nodeX/numastat
 Date:  October 2002
diff --git a/Documentation/ABI/testing/procfs-smaps_rollup 
b/Documentation/ABI/testing/procfs-smaps_rollup
index 274df44d8b1b..046978193368 100644
--- a/Documentation/ABI/testing/procfs-smaps_rollup
+++ b/Documentation/ABI/testing/procfs-smaps_rollup
@@ -11,7 +11,7 @@ Description:
Additionally, the fields Pss_Anon, Pss_File and Pss_Shmem
are not present in /proc/pid/smaps.  These fields represent
the sum of the Pss field of each type (anon, file, shmem).
-   For more details, see Documentation/filesystems/proc.txt
+   For more details, see Documentation/filesystems/proc.rst
and the procfs man page.
 
Typical output looks like this:
diff --git a/Documentation/admin-guide/cpu-load.rst 
b/Documentation/admin-guide/cpu-load.rst
index 2d01ce43d2a2..ebdecf864080 100644
--- a/Documentation/admin-guide/cpu-load.rst
+++ b/Documentation/admin-guide/cpu-load.rst
@@ -105,7 +105,7 @@ References
 --
 
 - http://lkml.or

Re: [PATCH 05/35] docs: filesystems: fix renamed references

2020-04-08 Thread David Sterba
On Wed, Apr 08, 2020 at 05:45:57PM +0200, Mauro Carvalho Chehab wrote:
> Some filesystem references got broken by a previous patch
> series I submitted. Address those.
> 
> Signed-off-by: Mauro Carvalho Chehab 

For

>  fs/affs/Kconfig | 2 +-

Acked-by: David Sterba 


[PATCH] powerpc/kasan: Fix stack overflow by increasing THREAD_SHIFT

2020-04-08 Thread Christophe Leroy
When CONFIG_KASAN is selected, the stack usage is increased.

In the same way as x86 and arm64 architectures, increase
THREAD_SHIFT when CONFIG_KASAN is selected.

Fixes: 2edb16efc899 ("powerpc/32: Add KASAN support")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=207129
Reported-by: 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 05d20a8d6581..511f9bbc 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -771,6 +771,7 @@ config THREAD_SHIFT
range 13 15
default "15" if PPC_256K_PAGES
default "14" if PPC64
+   default "14" if KASAN
default "13"
help
  Used to define the stack size. The default is almost always what you
-- 
2.25.0



[Bug 207129] PowerMac G4 DP (5.6.2 debug kernel + inline KASAN) freezes shortly after booting with "do_IRQ: stack overflow: 1760"

2020-04-08 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=207129

--- Comment #6 from Erhard F. (erhar...@mailbox.org) ---
Yes, precisely summarized! Thanks for your efforts!

CONFIG_KASAN though only is x86_64 not x86 AFAIK.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

Re: decruft the vmalloc API

2020-04-08 Thread Russell King - ARM Linux admin
On Wed, Apr 08, 2020 at 01:58:58PM +0200, Christoph Hellwig wrote:
> Hi all,
> 
> Peter noticed that with some dumb luck you can toast the kernel address
> space with exported vmalloc symbols.
> 
> I used this as an opportunity to decruft the vmalloc.c API and make it
> much more systematic.  This also removes any chance to create vmalloc
> mappings outside the designated areas or using executable permissions
> from modules.  Besides that it removes more than 300 lines of code.

I haven't read all your patches yet.

Have you tested it on 32-bit ARM, where the module area is located
_below_ PAGE_OFFSET and outside of the vmalloc area?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up


Re: decruft the vmalloc API

2020-04-08 Thread Christoph Hellwig
On Wed, Apr 08, 2020 at 05:03:24PM +0100, Russell King - ARM Linux admin wrote:
> I haven't read all your patches yet.
> 
> Have you tested it on 32-bit ARM, where the module area is located
> _below_ PAGE_OFFSET and outside of the vmalloc area?

I have not tested it.  However existing in-kernel users that use
different areas (and we have quite a few of those) have not been
changed at all.  I think the arm32 module loader (like various other
module loaders) falls into that category.


Section mismatch in reference from the function .early_init_mmu() to the function .init.text:.radix__early_init_mmu() after PowerPC updates 5.7-1

2020-04-08 Thread Christian Zigotzky

Hello,

Since the PowerPC updates 5.7-1 we have the following issue during the 
linking of vmlinux:


MODPOST vmlinux.o
WARNING: modpost: vmlinux.o(.text.unlikely+0x1a0): Section mismatch in 
reference from the function .early_init_mmu() to the function 
.init.text:.radix__early_init_mmu()

The function .early_init_mmu() references
the function __init .radix__early_init_mmu().
This is often because .early_init_mmu lacks a __init
annotation or the annotation of .radix__early_init_mmu is wrong.

WARNING: modpost: vmlinux.o(.text.unlikely+0x1ac): Section mismatch in 
reference from the function .early_init_mmu() to the function 
.init.text:.hash__early_init_mmu()

The function .early_init_mmu() references
the function __init .hash__early_init_mmu().
This is often because .early_init_mmu lacks a __init
annotation or the annotation of .hash__early_init_mmu is wrong.

---

But the kernel works without any problems after the linking.

I reverted the following commits:

70fbdfef4ba63eeef83b2c94eac9a5a9f913e442 -- sysfs: remove redundant 
__compat_only_sysfs_link_entry_to_kobj fn


d38c07afc356ddebaa3ed8ecb3f553340e05c969 -- Merge tag 'powerpc-5.7-1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux


And after that the linking of vmlinux works again.

Please check the PowerPC updates 5.7-1.

Thanks,
Christian


Re: [PATCH v3 1/1] ppc/crash: Reset spinlocks during crash

2020-04-08 Thread Leonardo Bras
On Wed, 2020-04-08 at 22:21 +1000, Michael Ellerman wrote:
[...]
> > On the other hand, busting the rtas.lock could be dangerous, because
> > it's code we can't control.
> > 
> > According with LoPAR, for both of these rtas-calls, we have:
> > 
> > For the PowerPC External Interrupt option: The call must be reentrant
> > to the number of processors on the platform.
> > For the PowerPC External Interrupt option: The argument call buffer for
> > each simultaneous call must be physically unique.
> 
> Oh well spotted. Where is that in the doc?

In the current LoPAR available on OpenPower Foundation, it's on page
170, '7.3.10.2 ibm,set-xive' and '7.3.10.3 ibm,int-off'.

> > Which I think means this rtas-calls can be done simultaneously.
> 
> I think so too. I'll read PAPR in the morning and make sure.
> 
> > Would it mean that busting the rtas.lock for these calls would be safe?
> 
> What would be better is to make those specific calls not take the global
> RTAS lock to begin with.
> 
> We should be able to just allocate the rtas_args on the stack, it's only
> ~80 odd bytes. And then we can use rtas_call_unlocked() which doesn't
> take the global lock.

Good idea. I will try getting some work done on this.

Best regards,


signature.asc
Description: This is a digitally signed message part


Re: [PATCH v3 1/1] ppc/crash: Reset spinlocks during crash

2020-04-08 Thread Leonardo Bras
On Wed, 2020-04-08 at 22:21 +1000, Michael Ellerman wrote:
> We should be able to just allocate the rtas_args on the stack, it's only
> ~80 odd bytes. And then we can use rtas_call_unlocked() which doesn't
> take the global lock.

At this point, would it be a problem using kmalloc? 

Best regards,


signature.asc
Description: This is a digitally signed message part


Re: [PATCH v5 18/21] powerpc64: Add prefixed instructions to instruction data type

2020-04-08 Thread Segher Boessenkool
On Mon, Apr 06, 2020 at 12:25:27PM +0200, Christophe Leroy wrote:
> > if (ppc_inst_prefixed(x) != ppc_inst_prefixed(y))
> > return false;
> > else if (ppc_inst_prefixed(x))
> > return !memcmp(&x, &y, sizeof(struct ppc_inst));
> 
> Are we sure memcmp() is a good candidate for the comparison ? Can we do 
> simpler ? Especially, I understood a prefixed instruction is a 64 bits 
> properly aligned instruction, can we do a simple u64 compare ? Or is GCC 
> intelligent enough to do that without calling memcmp() function which is 
> heavy ?

A prefixed insn is *not* 8-byte aligned, it is 4-byte aligned, fwiw.

memcmp() isn't as heavy as you fear, not with a non-ancient GCC at least.
But this could be written in a nicer way, sure :-)


Segher


Re: [PATCH v5 05/21] powerpc: Use a function for getting the instruction op code

2020-04-08 Thread Segher Boessenkool
Hi!

On Mon, Apr 06, 2020 at 06:09:20PM +1000, Jordan Niethe wrote:
> +static inline int ppc_inst_opcode(u32 x)
> +{
> + return x >> 26;
> +}

Maybe you should have "primary opcode" in this function name?


Segher


Re: [PATCH 1/1] powerpc/crash: Use NMI context for printk after crashing other CPUs

2020-04-08 Thread Leonardo Bras
On Wed, 2020-04-08 at 22:13 +1000, Michael Ellerman wrote:
[...]
> Added context:
> 
>   printk(KERN_EMERG "Sending IPI to other CPUs\n");
> 
>   if (crash_wake_offline)
>   ncpus = num_present_cpus() - 1;
> 
> >  
> > crash_send_ipi(crash_ipi_callback);
> > smp_wmb();
> > +   printk_nmi_enter();
>   
> Why did you decide to put it there, rather than at the start of
> default_machine_crash_shutdown() like I did?
> 
> The printk() above could have already deadlocked if another CPU is stuck
> with the logbuf lock held.

Oh, I thought the CPUs would start crashing after crash_send_ipi(), so
only printk() after that would possibly deadlock.

I was not able to see how the printk() above would deadlock, but I see
no problem adding that at the start of the function.

Best regards,
Leonardo Bras


signature.asc
Description: This is a digitally signed message part


Re: [PATCH v5 18/21] powerpc64: Add prefixed instructions to instruction data type

2020-04-08 Thread Christophe Leroy




Le 08/04/2020 à 20:11, Segher Boessenkool a écrit :

On Mon, Apr 06, 2020 at 12:25:27PM +0200, Christophe Leroy wrote:

if (ppc_inst_prefixed(x) != ppc_inst_prefixed(y))
return false;
else if (ppc_inst_prefixed(x))
return !memcmp(&x, &y, sizeof(struct ppc_inst));


Are we sure memcmp() is a good candidate for the comparison ? Can we do
simpler ? Especially, I understood a prefixed instruction is a 64 bits
properly aligned instruction, can we do a simple u64 compare ? Or is GCC
intelligent enough to do that without calling memcmp() function which is
heavy ?


A prefixed insn is *not* 8-byte aligned, it is 4-byte aligned, fwiw.


Ah, yes, I read too fast https://patchwork.ozlabs.org/patch/1266721/

It's not 64 bits, it is 64 bytes.



memcmp() isn't as heavy as you fear, not with a non-ancient GCC at least.
But this could be written in a nicer way, sure :-)


Segher



Christophe


[PATCH] soc: fsl: dpio: avoid stack usage warning

2020-04-08 Thread Arnd Bergmann
A 1024 byte variable on the stack will warn on any 32-bit architecture
during compile-testing, and is generally a bad idea anyway:

fsl/dpio/dpio-service.c: In function 
'dpaa2_io_service_enqueue_multiple_desc_fq':
fsl/dpio/dpio-service.c:495:1: error: the frame size of 1032 bytes is larger 
than 1024 bytes [-Werror=frame-larger-than=]

There are currently no callers of this function, so I cannot tell whether
dynamic memory allocation is allowed once callers are added. Change
it to kcalloc for now, if anyone gets a warning about calling this in
atomic context after they start using it, they can fix it later.

Fixes: 9d98809711ae ("soc: fsl: dpio: Adding QMAN multiple enqueue interface")
Signed-off-by: Arnd Bergmann 
---
 drivers/soc/fsl/dpio/dpio-service.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/soc/fsl/dpio/dpio-service.c 
b/drivers/soc/fsl/dpio/dpio-service.c
index cd4f6410e8c2..ff0ef8cbdbff 100644
--- a/drivers/soc/fsl/dpio/dpio-service.c
+++ b/drivers/soc/fsl/dpio/dpio-service.c
@@ -478,12 +478,17 @@ int dpaa2_io_service_enqueue_multiple_desc_fq(struct 
dpaa2_io *d,
const struct dpaa2_fd *fd,
int nb)
 {
-   int i;
-   struct qbman_eq_desc ed[32];
+   struct qbman_eq_desc *ed = kcalloc(sizeof(struct qbman_eq_desc), 32, 
GFP_KERNEL);
+   int i, ret;
+
+   if (!ed)
+   return -ENOMEM;
 
d = service_select(d);
-   if (!d)
-   return -ENODEV;
+   if (!d) {
+   ret = -ENODEV;
+   goto out;
+   }
 
for (i = 0; i < nb; i++) {
qbman_eq_desc_clear(&ed[i]);
@@ -491,7 +496,10 @@ int dpaa2_io_service_enqueue_multiple_desc_fq(struct 
dpaa2_io *d,
qbman_eq_desc_set_fq(&ed[i], fqid[i]);
}
 
-   return qbman_swp_enqueue_multiple_desc(d->swp, &ed[0], fd, nb);
+   ret = qbman_swp_enqueue_multiple_desc(d->swp, &ed[0], fd, nb);
+out:
+   kfree(ed);
+   return ret;
 }
 EXPORT_SYMBOL(dpaa2_io_service_enqueue_multiple_desc_fq);
 
-- 
2.26.0



[PATCH] soc: fsl: dpio: fix incorrect pointer conversions

2020-04-08 Thread Arnd Bergmann
Building dpio for 32 bit shows a new compiler warning from converting
a pointer to a u64:

drivers/soc/fsl/dpio/qbman-portal.c: In function 
'qbman_swp_enqueue_multiple_desc_direct':
drivers/soc/fsl/dpio/qbman-portal.c:870:14: warning: cast from pointer to 
integer of different size [-Wpointer-to-int-cast]
  870 |  addr_cena = (uint64_t)s->addr_cena;

The variable is not used anywhere, so removing the assignment seems
to be the correct workaround. After spotting what seemed to be
some confusion about address spaces, I ran the file through sparse,
which showed more warnings:

drivers/soc/fsl/dpio/qbman-portal.c:756:42: warning: incorrect type in argument 
1 (different address spaces)
drivers/soc/fsl/dpio/qbman-portal.c:756:42:expected void const volatile 
[noderef]  *addr
drivers/soc/fsl/dpio/qbman-portal.c:756:42:got unsigned int [usertype] 
*[assigned] p
drivers/soc/fsl/dpio/qbman-portal.c:902:42: warning: incorrect type in argument 
1 (different address spaces)
drivers/soc/fsl/dpio/qbman-portal.c:902:42:expected void const volatile 
[noderef]  *addr
drivers/soc/fsl/dpio/qbman-portal.c:902:42:got unsigned int [usertype] 
*[assigned] p

Here, the problem is passing a token from memremap() into __raw_readl(),
which is only defined to work on MMIO addresses but not RAM. Turning
this into a simple pointer dereference avoids this warning as well.

Fixes: 3b2abda7d28c ("soc: fsl: dpio: Replace QMAN array mode with ring mode 
enqueue")
Signed-off-by: Arnd Bergmann 
---
 drivers/soc/fsl/dpio/qbman-portal.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/soc/fsl/dpio/qbman-portal.c 
b/drivers/soc/fsl/dpio/qbman-portal.c
index d1f49caa5b13..804b8ba9bf5c 100644
--- a/drivers/soc/fsl/dpio/qbman-portal.c
+++ b/drivers/soc/fsl/dpio/qbman-portal.c
@@ -753,7 +753,7 @@ int qbman_swp_enqueue_multiple_mem_back(struct qbman_swp *s,
if (!s->eqcr.available) {
eqcr_ci = s->eqcr.ci;
p = s->addr_cena + QBMAN_CENA_SWP_EQCR_CI_MEMBACK;
-   s->eqcr.ci = __raw_readl(p) & full_mask;
+   s->eqcr.ci = *p & full_mask;
s->eqcr.available = qm_cyc_diff(s->eqcr.pi_ring_size,
eqcr_ci, s->eqcr.ci);
if (!s->eqcr.available) {
@@ -823,7 +823,6 @@ int qbman_swp_enqueue_multiple_desc_direct(struct qbman_swp 
*s,
const uint32_t *cl;
uint32_t eqcr_ci, eqcr_pi, half_mask, full_mask;
int i, num_enqueued = 0;
-   uint64_t addr_cena;
 
half_mask = (s->eqcr.pi_ci_mask>>1);
full_mask = s->eqcr.pi_ci_mask;
@@ -867,7 +866,6 @@ int qbman_swp_enqueue_multiple_desc_direct(struct qbman_swp 
*s,
 
/* Flush all the cacheline without load/store in between */
eqcr_pi = s->eqcr.pi;
-   addr_cena = (uint64_t)s->addr_cena;
for (i = 0; i < num_enqueued; i++)
eqcr_pi++;
s->eqcr.pi = eqcr_pi & full_mask;
@@ -901,7 +899,7 @@ int qbman_swp_enqueue_multiple_desc_mem_back(struct 
qbman_swp *s,
if (!s->eqcr.available) {
eqcr_ci = s->eqcr.ci;
p = s->addr_cena + QBMAN_CENA_SWP_EQCR_CI_MEMBACK;
-   s->eqcr.ci = __raw_readl(p) & full_mask;
+   s->eqcr.ci = *p & full_mask;
s->eqcr.available = qm_cyc_diff(s->eqcr.pi_ring_size,
eqcr_ci, s->eqcr.ci);
if (!s->eqcr.available)
-- 
2.26.0



Re: [PATCH] soc: fsl: dpio: avoid stack usage warning

2020-04-08 Thread Russell King - ARM Linux admin
On Wed, Apr 08, 2020 at 08:58:16PM +0200, Arnd Bergmann wrote:
> A 1024 byte variable on the stack will warn on any 32-bit architecture
> during compile-testing, and is generally a bad idea anyway:
> 
> fsl/dpio/dpio-service.c: In function 
> 'dpaa2_io_service_enqueue_multiple_desc_fq':
> fsl/dpio/dpio-service.c:495:1: error: the frame size of 1032 bytes is larger 
> than 1024 bytes [-Werror=frame-larger-than=]
> 
> There are currently no callers of this function, so I cannot tell whether
> dynamic memory allocation is allowed once callers are added. Change
> it to kcalloc for now, if anyone gets a warning about calling this in
> atomic context after they start using it, they can fix it later.
> 
> Fixes: 9d98809711ae ("soc: fsl: dpio: Adding QMAN multiple enqueue interface")
> Signed-off-by: Arnd Bergmann 
> ---
>  drivers/soc/fsl/dpio/dpio-service.c | 18 +-
>  1 file changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/soc/fsl/dpio/dpio-service.c 
> b/drivers/soc/fsl/dpio/dpio-service.c
> index cd4f6410e8c2..ff0ef8cbdbff 100644
> --- a/drivers/soc/fsl/dpio/dpio-service.c
> +++ b/drivers/soc/fsl/dpio/dpio-service.c
> @@ -478,12 +478,17 @@ int dpaa2_io_service_enqueue_multiple_desc_fq(struct 
> dpaa2_io *d,
>   const struct dpaa2_fd *fd,
>   int nb)
>  {
> - int i;
> - struct qbman_eq_desc ed[32];
> + struct qbman_eq_desc *ed = kcalloc(sizeof(struct qbman_eq_desc), 32, 
> GFP_KERNEL);

I think you need to rearrange this to be more compliant with the coding
style.

> + int i, ret;
> +
> + if (!ed)
> + return -ENOMEM;
>  
>   d = service_select(d);
> - if (!d)
> - return -ENODEV;
> + if (!d) {
> + ret = -ENODEV;
> + goto out;
> + }
>  
>   for (i = 0; i < nb; i++) {
>   qbman_eq_desc_clear(&ed[i]);
> @@ -491,7 +496,10 @@ int dpaa2_io_service_enqueue_multiple_desc_fq(struct 
> dpaa2_io *d,
>   qbman_eq_desc_set_fq(&ed[i], fqid[i]);
>   }
>  
> - return qbman_swp_enqueue_multiple_desc(d->swp, &ed[0], fd, nb);
> + ret = qbman_swp_enqueue_multiple_desc(d->swp, &ed[0], fd, nb);
> +out:
> + kfree(ed);
> + return ret;
>  }
>  EXPORT_SYMBOL(dpaa2_io_service_enqueue_multiple_desc_fq);
>  
> -- 
> 2.26.0
> 
> 

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up


Re: [PATCH 03/35] docs: fix broken references to text files

2020-04-08 Thread Paul E. McKenney
On Wed, Apr 08, 2020 at 05:45:55PM +0200, Mauro Carvalho Chehab wrote:
> Several references got broken due to txt to ReST conversion.
> 
> Several of them can be automatically fixed with:
> 
>   scripts/documentation-file-ref-check --fix
> 
> Reviewed-by: Mathieu Poirier  # 
> hwtracing/coresight/Kconfig
> Signed-off-by: Mauro Carvalho Chehab 

For the memory-barrier.txt portions:

Reviewed-by: Paul E. McKenney 

> ---
>  Documentation/memory-barriers.txt|  2 +-
>  Documentation/process/submit-checklist.rst   |  2 +-
>  .../translations/it_IT/process/submit-checklist.rst  |  2 +-
>  Documentation/translations/ko_KR/memory-barriers.txt |  2 +-
>  .../translations/zh_CN/filesystems/sysfs.txt |  2 +-
>  .../translations/zh_CN/process/submit-checklist.rst  |  2 +-
>  Documentation/virt/kvm/arm/pvtime.rst|  2 +-
>  Documentation/virt/kvm/devices/vcpu.rst  |  2 +-
>  Documentation/virt/kvm/hypercalls.rst|  4 ++--
>  arch/powerpc/include/uapi/asm/kvm_para.h |  2 +-
>  drivers/gpu/drm/Kconfig  |  2 +-
>  drivers/gpu/drm/drm_ioctl.c  |  2 +-
>  drivers/hwtracing/coresight/Kconfig  |  2 +-
>  fs/fat/Kconfig   |  8 
>  fs/fuse/Kconfig  |  2 +-
>  fs/fuse/dev.c|  2 +-
>  fs/overlayfs/Kconfig |  6 +++---
>  include/linux/mm.h   |  4 ++--
>  include/uapi/linux/ethtool_netlink.h |  2 +-
>  include/uapi/rdma/rdma_user_ioctl_cmds.h |  2 +-
>  mm/gup.c | 12 ++--
>  virt/kvm/arm/vgic/vgic-mmio-v3.c |  2 +-
>  virt/kvm/arm/vgic/vgic.h |  4 ++--
>  23 files changed, 36 insertions(+), 36 deletions(-)
> 
> diff --git a/Documentation/memory-barriers.txt 
> b/Documentation/memory-barriers.txt
> index e1c355e84edd..eaabc3134294 100644
> --- a/Documentation/memory-barriers.txt
> +++ b/Documentation/memory-barriers.txt
> @@ -620,7 +620,7 @@ because the CPUs that the Linux kernel supports don't do 
> writes
>  until they are certain (1) that the write will actually happen, (2)
>  of the location of the write, and (3) of the value to be written.
>  But please carefully read the "CONTROL DEPENDENCIES" section and the
> -Documentation/RCU/rcu_dereference.txt file:  The compiler can and does
> +Documentation/RCU/rcu_dereference.rst file:  The compiler can and does
>  break dependencies in a great many highly creative ways.
>  
>   CPU 1 CPU 2
> diff --git a/Documentation/process/submit-checklist.rst 
> b/Documentation/process/submit-checklist.rst
> index 8e56337d422d..3f8e9d5d95c2 100644
> --- a/Documentation/process/submit-checklist.rst
> +++ b/Documentation/process/submit-checklist.rst
> @@ -107,7 +107,7 @@ and elsewhere regarding submitting Linux kernel patches.
>  and why.
>  
>  26) If any ioctl's are added by the patch, then also update
> -``Documentation/ioctl/ioctl-number.rst``.
> +``Documentation/userspace-api/ioctl/ioctl-number.rst``.
>  
>  27) If your modified source code depends on or uses any of the kernel
>  APIs or features that are related to the following ``Kconfig`` symbols,
> diff --git a/Documentation/translations/it_IT/process/submit-checklist.rst 
> b/Documentation/translations/it_IT/process/submit-checklist.rst
> index 995ee69fab11..3e575502690f 100644
> --- a/Documentation/translations/it_IT/process/submit-checklist.rst
> +++ b/Documentation/translations/it_IT/process/submit-checklist.rst
> @@ -117,7 +117,7 @@ sottomissione delle patch, in particolare
>  sorgenti che ne spieghi la logica: cosa fanno e perché.
>  
>  25) Se la patch aggiunge nuove chiamate ioctl, allora aggiornate
> -``Documentation/ioctl/ioctl-number.rst``.
> +``Documentation/userspace-api/ioctl/ioctl-number.rst``.
>  
>  26) Se il codice che avete modificato dipende o usa una qualsiasi 
> interfaccia o
>  funzionalità del kernel che è associata a uno dei seguenti simboli
> diff --git a/Documentation/translations/ko_KR/memory-barriers.txt 
> b/Documentation/translations/ko_KR/memory-barriers.txt
> index 2e831ece6e26..e50fe6541335 100644
> --- a/Documentation/translations/ko_KR/memory-barriers.txt
> +++ b/Documentation/translations/ko_KR/memory-barriers.txt
> @@ -641,7 +641,7 @@ P 는 짝수 번호 캐시 라인에 저장되어 있고, 변수 B 는 홀수 
>  리눅스 커널이 지원하는 CPU 들은 (1) 쓰기가 정말로 일어날지, (2) 쓰기가 어디에
>  이루어질지, 그리고 (3) 쓰여질 값을 확실히 알기 전까지는 쓰기를 수행하지 않기
>  때문입니다.  하지만 "컨트롤 의존성" 섹션과
> -Documentation/RCU/rcu_dereference.txt 파일을 주의 깊게 읽어 주시기 바랍니다:
> +Documentation/RCU/rcu_dereference.rst 파일을 주의 깊게 읽어 주시기 바랍니다:
>  컴파일러는 매우 창의적인 많은 방법으로 종속성을 깰 수 있습니다.
>  
>   CPU 1 CPU 2
> diff --git a/Documentation/translations/zh_CN/filesystems/sysfs.txt 
> b/Documenta

[PATCH, RESEND, 2/3] selftests/powerpc: enable performance alerts when freezing counters on cycles_with_freeze_test selftest

2020-04-08 Thread Desnes A. Nunes do Rosario
From: Gustavo Romero 

When disabling freezing counters by setting MMCR0 FC bit to 0, the MMCR0
PMAE bit must also be enabled if a Performance Monitor Alert (and the cor-
responding Performance Monitor Interrupt) is still desired to be received
when an enabled condition or event occurs.

This is the case of the cycles_with_freeze_test selftest, since the test
disables the MMCR0 PMAE due to the usage of PMU to trigger EBBs. This can
make the test loop up to the point of being killed by the test harness
timeout (2500 ms), since no other ebb event will happen because the MMCR0
PMAE bit is disabled, and thus, no more increments to ebb_count occur.

Fixes: 3752e453f6bafd7 ("selftests/powerpc: Add tests of PMU EBBs")
Signed-off-by: Gustavo Romero 
[desnesn: Only set MMCR0_PMAE when disabling MMCR0_FC, reflow comment]
Signed-off-by: Desnes A. Nunes do Rosario 
---
 .../testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
index 0f2089f6f82c..d368199144fb 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
@@ -81,7 +81,7 @@ int cycles_with_freeze(void)
{
counters_frozen = false;
mb();
-   mtspr(SPRN_MMCR0, mfspr(SPRN_MMCR0) & ~MMCR0_FC);
+   mtspr(SPRN_MMCR0, (mfspr(SPRN_MMCR0) & ~MMCR0_FC) | MMCR0_PMAE);
 
FAIL_IF(core_busy_loop());
 
-- 
2.21.1



[PATCH, RESEND, 0/3] selftests/powerpc: A few fixes on powerpc selftests

2020-04-08 Thread Desnes A. Nunes do Rosario
This patchseries addresses a few fixes on powerpc selftests (first and
second patch are being resent).

The first fix has to do with the extra counts on PMCs resets, which not
only are shown on the trace_logs, but can also invalidate the results of a
few selftests. On the other hand, the second fix proper addresses the Per-
formance Monitor Alert (PMAE) bit on MMCR0 when freezing counters are dis-
abled on cycles_with_freeze_test selftest. Lastly, the third fix adds a
memory barrier on count_pmc() to ensure read consistency of PMCs. This is
necessary since these values are usually accounted on ebb_handlers to val-
lidade tests results, such as the overflow values on pmc56_overflow_test.

Desnes A. Nunes do Rosario (2):
  selftests/powerpc: Use write_pmc instead of count_pmc to reset PMCs on
ebb tests
  selftests/powerpc: ensure PMC reads are set and ordered on count_pmc

Gustavo Romero (1):
  selftests/powerpc: enable performance alerts when freezing counters on
cycles_with_freeze_test selftest

 .../powerpc/pmu/ebb/back_to_back_ebbs_test.c |  2 +-
 .../testing/selftests/powerpc/pmu/ebb/cycles_test.c  |  2 +-
 .../powerpc/pmu/ebb/cycles_with_freeze_test.c|  4 ++--
 .../powerpc/pmu/ebb/cycles_with_mmcr2_test.c |  2 +-
 tools/testing/selftests/powerpc/pmu/ebb/ebb.c|  6 +-
 .../powerpc/pmu/ebb/ebb_on_willing_child_test.c  |  2 +-
 .../selftests/powerpc/pmu/ebb/lost_exception_test.c  |  2 +-
 .../selftests/powerpc/pmu/ebb/multi_counter_test.c   | 12 ++--
 .../selftests/powerpc/pmu/ebb/multi_ebb_procs_test.c |  2 +-
 .../selftests/powerpc/pmu/ebb/pmae_handling_test.c   |  2 +-
 .../selftests/powerpc/pmu/ebb/pmc56_overflow_test.c  |  2 +-
 11 files changed, 21 insertions(+), 17 deletions(-)

-- 
2.21.1



[PATCH, RESEND, 1/3] selftests/powerpc: Use write_pmc instead of count_pmc to reset PMCs on ebb tests

2020-04-08 Thread Desnes A. Nunes do Rosario
By using count_pmc() to reset PMCs instead of write_pmc(), an extra count
is performed on ebb_state.stats.pmc_count[PMC_INDEX(pmc)]. This extra
pmc_count can occasionally invalidate results, such as the ones from
cycles_test shown hereafter. The ebb_check_count() failed with an above
the upper limit error due to the extra ebb_state.stats.pmc_count.

Furthermore, this extra count is also indicated by extra PMC1 trace_log on
the output of the cycle test (as well as on pmc56_overflow_test):

==
   ...
   [21]: counter = 8
   [22]: register SPRN_MMCR0 = 0x8080
   [23]: register SPRN_PMC1  = 0x8004
   [24]: counter = 9
   [25]: register SPRN_MMCR0 = 0x8080
   [26]: register SPRN_PMC1  = 0x8004
   [27]: counter = 10
   [28]: register SPRN_MMCR0 = 0x8080
   [29]: register SPRN_PMC1  = 0x8004
>> [30]: register SPRN_PMC1  = 0x451e
PMC1 count (0x28546) above upper limit 0x283e8 (+0x15e)
[FAIL] Test FAILED on line 52
failure: cycles
==

[desnesn: reflow of original comment]
Signed-off-by: Desnes A. Nunes do Rosario 
---
 .../powerpc/pmu/ebb/back_to_back_ebbs_test.c |  2 +-
 .../testing/selftests/powerpc/pmu/ebb/cycles_test.c  |  2 +-
 .../powerpc/pmu/ebb/cycles_with_freeze_test.c|  2 +-
 .../powerpc/pmu/ebb/cycles_with_mmcr2_test.c |  2 +-
 tools/testing/selftests/powerpc/pmu/ebb/ebb.c|  2 +-
 .../powerpc/pmu/ebb/ebb_on_willing_child_test.c  |  2 +-
 .../selftests/powerpc/pmu/ebb/lost_exception_test.c  |  2 +-
 .../selftests/powerpc/pmu/ebb/multi_counter_test.c   | 12 ++--
 .../selftests/powerpc/pmu/ebb/multi_ebb_procs_test.c |  2 +-
 .../selftests/powerpc/pmu/ebb/pmae_handling_test.c   |  2 +-
 .../selftests/powerpc/pmu/ebb/pmc56_overflow_test.c  |  2 +-
 11 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c
index a2d7b0e3dca9..f133ab425f10 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/back_to_back_ebbs_test.c
@@ -91,7 +91,7 @@ int back_to_back_ebbs(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
+   write_pmc(1, pmc_sample_period(sample_period));
 
dump_ebb_state();
 
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c
index bc893813483e..14a399a64729 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/cycles_test.c
@@ -42,7 +42,7 @@ int cycles(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
+   write_pmc(1, pmc_sample_period(sample_period));
 
dump_ebb_state();
 
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
index dcd351d20328..0f2089f6f82c 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_freeze_test.c
@@ -99,7 +99,7 @@ int cycles_with_freeze(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
+   write_pmc(1, pmc_sample_period(sample_period));
 
dump_ebb_state();
 
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c
index 94c99c12c0f2..a8f3bee04cd8 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/cycles_with_mmcr2_test.c
@@ -71,7 +71,7 @@ int cycles_with_mmcr2(void)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
+   write_pmc(1, pmc_sample_period(sample_period));
 
dump_ebb_state();
 
diff --git a/tools/testing/selftests/powerpc/pmu/ebb/ebb.c 
b/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
index dfbc5c3ad52d..bf6f25dfcf7b 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
@@ -396,7 +396,7 @@ int ebb_child(union pipe read_pipe, union pipe write_pipe)
ebb_global_disable();
ebb_freeze_pmcs();
 
-   count_pmc(1, sample_period);
+   write_pmc(1, pmc_sample_period(sample_period));
 
dump_ebb_state();
 
diff --git 
a/tools/testing/selftests/powerpc/pmu/ebb/ebb_on_willing_child_test.c 
b/tools/testing/selftests/powerpc/pmu/ebb/ebb_on_willing_child_test.c
index ca2f7d729155..513812cdcca1 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/ebb_on_willing_child_test.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/ebb_on_willing_child_test.c
@@ -38,7 +38,7 @@ static int victim_child(union pipe read_pipe, union pipe 
write_pipe)

[PATCH 3/3] selftests/powerpc: ensure PMC reads are set and ordered on count_pmc

2020-04-08 Thread Desnes A. Nunes do Rosario
Function count_pmc() needs a memory barrier to ensure that PMC reads are
fully consistent. The lack of it can occasionally fail pmc56_overflow test,
since depending on the workload on the system, PMC5 & 6 can have past val-
ues from the time the counters are frozen and turned back on. These past
values will be accounted as overflows and make the test fail.

=
test: pmc56_overflow
...
ebb_state:
...
>>pmc[5] count = 0xfd4cbc8c
>>pmc[6] count = 0xddd8b3b6
HW state:
MMCR0 0x8400 FC PMAE
MMCR2 0x
EBBHR 0x10003f68
BESCR 0x8000 GE
...
PMC5  0x
PMC6  0x
SIAR  0x10003398
...
  [3]: register SPRN_PMC2  = 0x8003
  [4]: register SPRN_PMC5  = 0x
  [5]: register SPRN_PMC6  = 0x
  [6]: register SPRN_PMC2  = 0x8003
>>[7]: register SPRN_PMC5  = 0x8f21266d
>>[8]: register SPRN_PMC6  = 0x0da80f8d
  [9]: register SPRN_PMC2  = 0x8003
>>[10]: register SPRN_PMC5  = 0x6e2b961f
>>[11]: register SPRN_PMC6  = 0xd030a429
  [12]: register SPRN_PMC2  = 0x8003
  [13]: register SPRN_PMC5  = 0x
  [14]: register SPRN_PMC6  = 0x
...
PMC5/6 overflow 2
[FAIL] Test FAILED on line 87
failure: pmc56_overflow
=

Signed-off-by: Desnes A. Nunes do Rosario 
---
 tools/testing/selftests/powerpc/pmu/ebb/ebb.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/tools/testing/selftests/powerpc/pmu/ebb/ebb.c 
b/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
index bf6f25dfcf7b..6199f3cea0f9 100644
--- a/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
+++ b/tools/testing/selftests/powerpc/pmu/ebb/ebb.c
@@ -258,6 +258,10 @@ int count_pmc(int pmc, uint32_t sample_period)
start_value = pmc_sample_period(sample_period);
 
val = read_pmc(pmc);
+
+   /* Ensure pmc value is consistent between freezes */
+   mb();
+
if (val < start_value)
ebb_state.stats.negative++;
else
-- 
2.21.1



[PATCH 1/1] powerpc/rtas: Implement reentrant rtas call

2020-04-08 Thread Leonardo Bras
Implement rtas_call_reentrant() for reentrant rtas-calls:
"ibm,int-on", "ibm,int-off",ibm,get-xive" and  "ibm,set-xive".

On LoPAPR Version 1.1 (March 24, 2016), from 7.3.10.1 to 7.3.10.4,
items 2 and 3 say:

2 - For the PowerPC External Interrupt option: The * call must be
reentrant to the number of processors on the platform.
3 - For the PowerPC External Interrupt option: The * argument call
buffer for each simultaneous call must be physically unique.

So, these rtas-calls can be called in a lockless way, if using
a different buffer for each call.

This can be useful to avoid deadlocks in crashing, where rtas-calls are
needed, but some other thread crashed holding the rtas.lock.

Signed-off-by: Leonardo Bras 
---
 arch/powerpc/include/asm/rtas.h |  1 +
 arch/powerpc/kernel/rtas.c  | 21 +
 arch/powerpc/sysdev/xics/ics-rtas.c | 22 +++---
 3 files changed, 33 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 3c1887351c71..1ad1c85dab5e 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -352,6 +352,7 @@ extern struct rtas_t rtas;
 extern int rtas_token(const char *service);
 extern int rtas_service_present(const char *service);
 extern int rtas_call(int token, int, int, int *, ...);
+int rtas_call_reentrant(int token, int nargs, int nret, int *outputs, ...);
 void rtas_call_unlocked(struct rtas_args *args, int token, int nargs,
int nret, ...);
 extern void __noreturn rtas_restart(char *cmd);
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index c5fa251b8950..85e7511afe25 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -483,6 +483,27 @@ int rtas_call(int token, int nargs, int nret, int 
*outputs, ...)
 }
 EXPORT_SYMBOL(rtas_call);
 
+/*
+ * Used for reentrant rtas calls.
+ * According to LoPAR documentation, only "ibm,int-on", "ibm,int-off",
+ * "ibm,get-xive" and "ibm,set-xive" are currently reentrant.
+ * Reentrant calls need their own rtas_args buffer, so not using rtas.args.
+ */
+int rtas_call_reentrant(int token, int nargs, int nret, int *outputs, ...)
+{
+   va_list list;
+   struct rtas_args rtas_args;
+
+   if (!rtas.entry || token == RTAS_UNKNOWN_SERVICE)
+   return -1;
+
+   va_start(list, outputs);
+   va_rtas_call_unlocked(&rtas_args, token, nargs, nret, list);
+   va_end(list);
+
+   return be32_to_cpu(rtas_args.rets[0]);
+}
+
 /* For RTAS_BUSY (-2), delay for 1 millisecond.  For an extended busy status
  * code of 990n, perform the hinted delay of 10^n (last digit) milliseconds.
  */
diff --git a/arch/powerpc/sysdev/xics/ics-rtas.c 
b/arch/powerpc/sysdev/xics/ics-rtas.c
index 6aabc74688a6..4cf18000f07c 100644
--- a/arch/powerpc/sysdev/xics/ics-rtas.c
+++ b/arch/powerpc/sysdev/xics/ics-rtas.c
@@ -50,8 +50,8 @@ static void ics_rtas_unmask_irq(struct irq_data *d)
 
server = xics_get_irq_server(d->irq, irq_data_get_affinity_mask(d), 0);
 
-   call_status = rtas_call(ibm_set_xive, 3, 1, NULL, hw_irq, server,
-   DEFAULT_PRIORITY);
+   call_status = rtas_call_reentrant(ibm_set_xive, 3, 1, NULL, hw_irq,
+ server, DEFAULT_PRIORITY);
if (call_status != 0) {
printk(KERN_ERR
"%s: ibm_set_xive irq %u server %x returned %d\n",
@@ -60,7 +60,7 @@ static void ics_rtas_unmask_irq(struct irq_data *d)
}
 
/* Now unmask the interrupt (often a no-op) */
-   call_status = rtas_call(ibm_int_on, 1, 1, NULL, hw_irq);
+   call_status = rtas_call_reentrant(ibm_int_on, 1, 1, NULL, hw_irq);
if (call_status != 0) {
printk(KERN_ERR "%s: ibm_int_on irq=%u returned %d\n",
__func__, hw_irq, call_status);
@@ -91,7 +91,7 @@ static void ics_rtas_mask_real_irq(unsigned int hw_irq)
if (hw_irq == XICS_IPI)
return;
 
-   call_status = rtas_call(ibm_int_off, 1, 1, NULL, hw_irq);
+   call_status = rtas_call_reentrant(ibm_int_off, 1, 1, NULL, hw_irq);
if (call_status != 0) {
printk(KERN_ERR "%s: ibm_int_off irq=%u returned %d\n",
__func__, hw_irq, call_status);
@@ -99,8 +99,8 @@ static void ics_rtas_mask_real_irq(unsigned int hw_irq)
}
 
/* Have to set XIVE to 0xff to be able to remove a slot */
-   call_status = rtas_call(ibm_set_xive, 3, 1, NULL, hw_irq,
-   xics_default_server, 0xff);
+   call_status = rtas_call_reentrant(ibm_set_xive, 3, 1, NULL, hw_irq,
+ xics_default_server, 0xff);
if (call_status != 0) {
printk(KERN_ERR "%s: ibm_set_xive(0xff) irq=%u returned %d\n",
__func__, hw_irq, call_status);
@@ -131,7 +131,7 @@ static int ics_rtas_set_affinity(struct i

Re: usb: gadget: fsl_udc_core: Checking for a failed platform_get_irq() call in fsl_udc_probe()

2020-04-08 Thread Li Yang
On Wed, Apr 8, 2020 at 9:19 AM Markus Elfring  wrote:
>
> Hello,
>
> I have taken another look at the implementation of the function 
> “fsl_udc_probe”.
> A software analysis approach points the following source code out for
> further development considerations.
> https://elixir.bootlin.com/linux/v5.6.2/source/drivers/usb/gadget/udc/fsl_udc_core.c#L2443
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/usb/gadget/udc/fsl_udc_core.c?id=f5e94d10e4c468357019e5c28d48499f677b284f#n2442
>
> udc_controller->irq = platform_get_irq(pdev, 0);
> if (!udc_controller->irq) {
> ret = -ENODEV;
> goto err_iounmap;
> }
>
>
> The software documentation is providing the following information
> for the used programming interface.
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/base/platform.c?id=f5e94d10e4c468357019e5c28d48499f677b284f#n221
> https://elixir.bootlin.com/linux/v5.6.2/source/drivers/base/platform.c#L202
>
> “…
>  * Return: IRQ number on success, negative error number on failure.
> …”
>
> Would you like to reconsider the shown condition check?

Thanks for the finding.  This is truly a software issue that need to
be fixed.  Would you submit a patch for it or you want us to fix it?

Regards,
Leo


Re: [PATCH v3 1/1] ppc/crash: Reset spinlocks during crash

2020-04-08 Thread Leonardo Bras
On Wed, 2020-04-08 at 22:21 +1000, Michael Ellerman wrote:
[...]
> > According with LoPAR, for both of these rtas-calls, we have:
> > 
> > For the PowerPC External Interrupt option: The call must be reentrant
> > to the number of processors on the platform.
> > For the PowerPC External Interrupt option: The argument call buffer for
> > each simultaneous call must be physically unique.
> 
> Oh well spotted. Where is that in the doc?
> 
> > Which I think means this rtas-calls can be done simultaneously.
> 
> I think so too. I'll read PAPR in the morning and make sure.
> 
> > Would it mean that busting the rtas.lock for these calls would be safe?
> 
> What would be better is to make those specific calls not take the global
> RTAS lock to begin with.
> 
> We should be able to just allocate the rtas_args on the stack, it's only
> ~80 odd bytes. And then we can use rtas_call_unlocked() which doesn't
> take the global lock.

Hello Michael,

I did the simplest possible version of this change:
http://patchwork.ozlabs.org/patch/1268371/

Where I create a rtas_call_reentrant(), and replace rtas_call() for
that in all the possible calls of "ibm,int-on", "ibm,int-off",ibm,get-
xive" and  "ibm,set-xive".

At first, I was planning on creating a function that tells if the
requested token is one of above, before automatically choosing between
the common and reentrant versions. But it seemed like unnecessary
overhead, since the current calls are very few and very straight. 

What do you think on this?

Best regards,
Leonardo Bras


signature.asc
Description: This is a digitally signed message part


Re: [PATCH v3 1/1] ppc/crash: Reset spinlocks during crash

2020-04-08 Thread Paul Mackerras
On Wed, Apr 08, 2020 at 10:21:29PM +1000, Michael Ellerman wrote:
> 
> We should be able to just allocate the rtas_args on the stack, it's only
> ~80 odd bytes. And then we can use rtas_call_unlocked() which doesn't
> take the global lock.

Do we instantiate a 64-bit RTAS these days, or is it still 32-bit?
In the old days we had to make sure the RTAS argument buffer was
below the 4GB point.  If that's still necessary then perhaps putting
rtas_args inside the PACA would be the way to go.

Paul.


Re: [PATCH 31/35] powerpc: docs: cxl.rst: mark two section titles as such

2020-04-08 Thread Andrew Donnellan

On 9/4/20 1:46 am, Mauro Carvalho Chehab wrote:

The User API chapter contains two sub-chapters. Mark them as
such.

Signed-off-by: Mauro Carvalho Chehab 


Thanks.

Though the other subsections in this file use - rather than ^, 
what's the difference?


Acked-by: Andrew Donnellan 


---
  Documentation/powerpc/cxl.rst | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/Documentation/powerpc/cxl.rst b/Documentation/powerpc/cxl.rst
index 920546d81326..d2d77057610e 100644
--- a/Documentation/powerpc/cxl.rst
+++ b/Documentation/powerpc/cxl.rst
@@ -133,6 +133,7 @@ User API
  
  
  1. AFU character devices

+
  
  For AFUs operating in AFU directed mode, two character device

  files will be created. /dev/cxl/afu0.0m will correspond to a
@@ -395,6 +396,7 @@ read
  
  
  2. Card character device (powerVM guest only)

+^
  
  In a powerVM guest, an extra character device is created for the

  card. The device is only used to write (flash) a new image on the



--
Andrew Donnellan  OzLabs, ADL Canberra
a...@linux.ibm.com IBM Australia Limited



Re: [PATCH V2 0/3] mm/debug: Add more arch page table helper tests

2020-04-08 Thread Anshuman Khandual


On 04/08/2020 05:45 PM, Gerald Schaefer wrote:
> On Wed, 8 Apr 2020 12:41:51 +0530
> Anshuman Khandual  wrote:
> 
> [...]
>>>   

 Some thing like this instead.

 pte_t pte = READ_ONCE(*ptep);
 pte = pte_mkhuge(__pte((pte_val(pte) | RANDOM_ORVALUE) & PMD_MASK));

 We cannot use mk_pte_phys() as it is defined only on some platforms
 without any generic fallback for others.  
>>>
>>> Oh, didn't know that, sorry. What about using mk_pte() instead, at least
>>> it would result in a present pte:
>>>
>>> pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), prot));  
>>
>> Lets use mk_pte() here but can we do this instead
>>
>> paddr = (__pfn_to_phys(pfn) | RANDOM_ORVALUE) & PMD_MASK;
>> pte = pte_mkhuge(mk_pte(phys_to_page(paddr), prot));
>>
> 
> Sure, that will also work.
> 
> BTW, this RANDOM_ORVALUE is not really very random, the way it is
> defined. For s390 we already changed it to mask out some arch bits,
> but I guess there are other archs and bits that would always be
> set with this "not so random" value, and I wonder if/how that would
> affect all the tests using this value, see also below.

RANDOM_ORVALUE is a constant which was added in order to make sure
that the page table entries should have some non-zero value before
getting called with pxx_clear() and followed by a pxx_none() check.
This is currently used only in pxx_clear_tests() tests. Hence there
is no impact for the existing tests.

> 
>>>
>>> And if you also want to do some with the existing value, which seems
>>> to be an empty pte, then maybe just check if writing and reading that
>>> value with set_huge_pte_at() / huge_ptep_get() returns the same,
>>> i.e. initially w/o RANDOM_ORVALUE.
>>>
>>> So, in combination, like this (BTW, why is the barrier() needed, it
>>> is not used for the other set_huge_pte_at() calls later?):  
>>
>> Ahh missed, will add them. Earlier we faced problem without it after
>> set_pte_at() for a test on powerpc (64) platform. Hence just added it
>> here to be extra careful.
>>
>>>
>>> @@ -733,24 +733,28 @@ static void __init hugetlb_advanced_test
>>> struct page *page = pfn_to_page(pfn);
>>> pte_t pte = READ_ONCE(*ptep);
>>>  
>>> -   pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
>>> +   set_huge_pte_at(mm, vaddr, ptep, pte);
>>> +   WARN_ON(!pte_same(pte, huge_ptep_get(ptep)));
>>> +
>>> +   pte = pte_mkhuge(mk_pte(phys_to_page(RANDOM_ORVALUE & PMD_MASK), 
>>> prot));
>>> set_huge_pte_at(mm, vaddr, ptep, pte);
>>> barrier();
>>> WARN_ON(!pte_same(pte, huge_ptep_get(ptep)));
>>>
>>> This would actually add a new test "write empty pte with
>>> set_huge_pte_at(), then verify with huge_ptep_get()", which happens
>>> to trigger a warning on s390 :-)  
>>
>> On arm64 as well which checks for pte_present() in set_huge_pte_at().
>> But PTE present check is not really present in each set_huge_pte_at()
>> implementation especially without __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT.
>> Hence wondering if we should add this new test here which will keep
>> giving warnings on s390 and arm64 (at the least).
> 
> Hmm, interesting. I forgot about huge swap / migration, which is not
> (and probably cannot be) supported on s390. The pte_present() check
> on arm64 seems to check for such huge swap / migration entries,
> according to the comment.
> 
> The new test "write empty pte with set_huge_pte_at(), then verify
> with huge_ptep_get()" would then probably trigger the
> WARN_ON(!pte_present(pte)) in arm64 code. So I guess "writing empty
> ptes with set_huge_pte_at()" is not really a valid use case in practice,
> or else you would have seen this warning before. In that case, it
> might not be a good idea to add this test.

Got it.

> 
> I also do wonder now, why the original test with
> "pte = __pte(pte_val(pte) | RANDOM_ORVALUE);"
> did not also trigger that warning on arm64. On s390 this test failed
> exactly because the constructed pte was not present (initially empty,
> or'ing RANDOM_ORVALUE does not make it present for s390). I guess this
> just worked by chance on arm64, because the bits from RANDOM_ORVALUE
> also happened to mark the pte present for arm64.

That is correct. RANDOM_ORVALUE has got PTE_PROT_NONE bit set that makes
the PTE test for pte_present().

On arm64 platform,

#define pte_present(pte)  (!!(pte_val(pte) & (PTE_VALID | PTE_PROT_NONE)))

> 
> This brings us back to the question above, regarding the "randomness"
> of RANDOM_ORVALUE. Not really sure what the intention behind that was,
> but maybe it would make sense to restrict this RANDOM_ORVALUE to
> non-arch-specific bits, i.e. only bits that would be part of the
> address value within a page table entry? Or was it intentionally
> chosen to also mess with other bits?

As mentioned before, RANDOM_ORVALUE just helped make a given page table
entry contain non-zero values before getting cleared. AFAICS we should
not need RANDOM_ORVALUE for HugeTLB test here. I believe th

  1   2   >