date:20210616

Re: [PATCH v4 1/2] module: add elf_check_module_arch for module specific elf arch checks

2021-06-16 Thread Nicholas Piggin

Excerpts from Jessica Yu's message of June 16, 2021 10:54 pm:
> +++ Nicholas Piggin [16/06/21 11:18 +1000]:
>>Excerpts from Jessica Yu's message of June 15, 2021 10:17 pm:
>>> +++ Nicholas Piggin [15/06/21 12:05 +1000]:
Excerpts from Jessica Yu's message of June 14, 2021 10:06 pm:
> +++ Nicholas Piggin [11/06/21 19:39 +1000]:
>>The elf_check_arch() function is used to test usermode binaries, but
>>kernel modules may have more specific requirements. powerpc would like
>>to test for ABI version compatibility.
>>
>>Add an arch-overridable function elf_check_module_arch() that defaults
>>to elf_check_arch() and use it in elf_validity_check().
>>
>>Signed-off-by: Michael Ellerman 
>>[np: split patch, added changelog]
>>Signed-off-by: Nicholas Piggin 
>>---
>> include/linux/moduleloader.h | 5 +
>> kernel/module.c  | 2 +-
>> 2 files changed, 6 insertions(+), 1 deletion(-)
>>
>>diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
>>index 9e09d11ffe5b..fdc042a84562 100644
>>--- a/include/linux/moduleloader.h
>>+++ b/include/linux/moduleloader.h
>>@@ -13,6 +13,11 @@
>>  * must be implemented by each architecture.
>>  */
>>
>>+// Allow arch to optionally do additional checking of module ELF header
>>+#ifndef elf_check_module_arch
>>+#define elf_check_module_arch elf_check_arch
>>+#endif
>
> Hi Nicholas,
>
> Why not make elf_check_module_arch() consistent with the other
> arch-specific functions? Please see module_frob_arch_sections(),
> module_{init,exit}_section(), etc in moduleloader.h. That is, they are
> all __weak functions that are overridable by arches. We can maybe make
> elf_check_module_arch() a weak symbol, available for arches to
> override if they want to perform additional elf checks. Then we don't
> have to have this one-off #define.


Like this? I like it. Good idea.
>>>
>>> Yeah! Also, maybe we can alternatively make elf_check_module_arch() a
>>> separate check entirely so that the powerpc implementation doesn't
>>> have to include that extra elf_check_arch() call. Something like this maybe?
>>
>>Yeah we can do that. Would you be okay if it goes via powerpc tree? If
>>yes, then we should get your Ack (or SOB because it seems to be entirely
>>your patch now :D)
> 
> This can go through the powerpc tree. Will you do another respin
> of this patch? And yes, feel free to take my SOB for this one -
> 
>  Signed-off-by: Jessica Yu 

You're maintainer so let's go with your preference. We can always adjust 
the arch hooks later if a need comes up. And yes I'll re post with you 
cc'ed.

Thanks,
Nick

Re: [PATCH 02/11] powerpc: Add Microwatt device tree

2021-06-16 Thread Michael Ellerman

Paul Mackerras  writes:
>

Little bit of change log never hurts :)

> Signed-off-by: Paul Mackerras 
> ---
>  arch/powerpc/boot/dts/microwatt.dts | 105 
>  1 file changed, 105 insertions(+)
>  create mode 100644 arch/powerpc/boot/dts/microwatt.dts
>
> diff --git a/arch/powerpc/boot/dts/microwatt.dts 
> b/arch/powerpc/boot/dts/microwatt.dts
> new file mode 100644
> index ..9b2e64da9432
> --- /dev/null
> +++ b/arch/powerpc/boot/dts/microwatt.dts
> @@ -0,0 +1,105 @@
> +/dts-v1/;
> +
> +/ {
> + #size-cells = <0x02>;
> + #address-cells = <0x02>;
> + model-name = "microwatt";
> + compatible = "microwatt-soc";
> +
> + reserved-memory {
> + #size-cells = <0x02>;
> + #address-cells = <0x02>;
> + ranges;
> + };
> +
> + memory@0 {
> + device_type = "memory";
> + reg = <0x 0x 0x 0x1000>;
> + };
> +
> + cpus {
> + #size-cells = <0x00>;
> + #address-cells = <0x01>;
> +
> + ibm,powerpc-cpu-features {
> + display-name = "Microwatt";
> + isa = <3000>;
> + device_type = "cpu-features";
> + compatible = "ibm,powerpc-cpu-features";
> +
> + mmu-radix {
> + isa = <3000>;
> + usable-privilege = <2>;

skiboot says 6?

> + os-support = <0x00>;
> + };
> +
> + little-endian {
> + isa = <0>;

I guess you just copied that from skiboot.

The binding says it's required, but AFAICS the kernel doesn't use it.

And isa = 0 mean ISA_BASE, according to the skiboot source.

> + usable-privilege = <3>;
> + os-support = <0x00>;
> + };
> +
> + cache-inhibited-large-page {
> + isa = <0x00>;
> + usable-privilege = <2>;

skiboot says 6, ie. HV and OS.
Don't think it actually matters because you say os-support = 0.

> + os-support = <0x00>;
> + };
> +
> + fixed-point-v3 {
> + isa = <3000>;
> + usable-privilege = <3>;

skiboot says 7.

> + };
> +
> + no-execute {
> + isa = <0x00>;
> + usable-privilege = <2>;

skiboot says 6.

> + os-support = <0x00>;
> + };
> +
> + floating-point {
> + hfscr-bit-nr = <0x00>;
> + hwcap-bit-nr = <0x1b>;

Looks right, bit 27:

#define PPC_FEATURE_HAS_FPU 0x0800


> + isa = <0x00>;
> + usable-privilege = <0x07>;
> + hv-support = <0x00>;
> + os-support = <0x00>;
> + };
> + };
> +
> + PowerPC,Microwatt@0 {
> + i-cache-sets = <2>;
> + ibm,dec-bits = <64>;
> + reservation-granule-size = <64>;

Never seen that one before.

> + clock-frequency = <1>;
> + timebase-frequency = <1>;

Those seem quite high?

> + i-tlb-sets = <1>;
> + ibm,ppc-interrupt-server#s = <0>;
> + i-cache-block-size = <64>;
> + d-cache-block-size = <64>;

The kernel reads those, but also hard codes 128 in places.
See L1_CACHE_BYTES.

> + ibm,pa-features = [40 00 c2 27 00 00 00 80 00 00 00 00 
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 80 00 80 00 80 00 00 00 80 00 80 00 
> 00 00 80 00 80 00 80 00 80 00 80 00 80 00 80 00 80 00 80 00 80 00 80 00 80 00 
> 80 00];

Do you need that?

You shouldn't, if we've done things right with the cpu-features support.

> + d-cache-sets = <2>;
> + ibm,pir = <0x3c>;

Needed?

> + i-tlb-size = <64>;
> + cpu-version = <0x99>;
> + status = "okay";
> + i-cache-size = <0x1000>;
> + ibm,processor-radix-AP-encodings = <0x0c 0xa010 
> 0x2015 0x401e>;
> + tlb-size = <0>;
> + tlb-sets = <0>;

Does the kernel use those? I can't find it.

> + device_type = "cpu";
> + d-tlb-size = <128>;
> + d-tlb-sets = <2>;
> + reg = <0>;
> + general-purpose;
> + 64-bit;
> + d-cache-size = <0x1000>;
> +

Re: [PATCH 11/11] powerpc/microwatt: Disable interrupts in boot wrapper main program

2021-06-16 Thread Michael Ellerman

Nicholas Piggin  writes:
> Excerpts from Segher Boessenkool's message of June 17, 2021 9:37 am:
>> On Tue, Jun 15, 2021 at 09:05:27AM +1000, Paul Mackerras wrote:
>>> This ensures that we don't get a decrementer interrupt arriving before
>>> we have set up a handler for it.
>> 
>> Maybe add a comment saying this is setting MSR[EE]=0 for that?  Or do
>> other bits here matter as well?
>
> Hmm, it actually clears MSR[RI] as well.
>
> __hard_irq_disable() is what we want here, unless the MSR[RI] clearing 
> is required as well, in which case there is __hard_EE_RI_disable().

But neither of those exist in the boot wrapper (yet).

cheers

Re: [PATCH v2 2/4] powerpc/interrupt: Refactor prep_irq_for_user_exit()

2021-06-16 Thread Nicholas Piggin

Excerpts from Christophe Leroy's message of June 15, 2021 6:37 pm:
> 
> 
> Le 11/06/2021 à 04:30, Nicholas Piggin a écrit :
>> Excerpts from Christophe Leroy's message of June 5, 2021 12:56 am:
>>> prep_irq_for_user_exit() is a superset of
>>> prep_irq_for_kernel_enabled_exit().
>>>
>>> Refactor it.
>> 
>> I like the refactoring, but now prep_irq_for_user_exit() is calling
>> prep_irq_for_kernel_enabled_exit(), which seems like the wrong naming.
>> 
>> You could re-name prep_irq_for_kernel_enabled_exit() to
>> prep_irq_for_enabled_exit() maybe? Or it could be
>> __prep_irq_for_enabled_exit() then prep_irq_for_kernel_enabled_exit()
>> and prep_irq_for_user_exit() would both call it.
> 
> I renamed it prep_irq_for_enabled_exit().
> 
> And I realised that after patch 4, prep_irq_for_enabled_exit() has become a 
> trivial function used 
> only once.
> 
> So I swapped patches 1/2 with patches 3/4 and added a 5th one to squash 
> prep_irq_for_enabled_exit() 
> into its caller.
> 
> You didn't have any comment on patch 4 (that is now patch 2) ?

I think it's okay, just trying to hunt down some apparent big-endian bug 
with my series. I can't see any problems with yours though, thanks for
rebasing them, I'll take a better look when I can.

Thanks,
Nick

[powerpc:next-test] BUILD SUCCESS 27f6bc40cd9485020869fb01244b77ae6a266680

2021-06-16 Thread kernel test robot

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next-test
branch HEAD: 27f6bc40cd9485020869fb01244b77ae6a266680  selftests/powerpc: 
Always test lmw and stmw

elapsed time: 740m

configs tested: 153
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm defconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
arm64allyesconfig
powerpc  mpc885_ads_defconfig
arm pxa_defconfig
sh ap325rxa_defconfig
openriscor1ksim_defconfig
m68k apollo_defconfig
arm   h3600_defconfig
armpleb_defconfig
shshmin_defconfig
powerpc  makalu_defconfig
pariscgeneric-64bit_defconfig
archsdk_defconfig
xtensa  audio_kc705_defconfig
mipse55_defconfig
powerpc mpc837x_rdb_defconfig
mipsworkpad_defconfig
xtensageneric_kc705_defconfig
arm   sama5_defconfig
arm s5pv210_defconfig
ia64 bigsur_defconfig
armoxnas_v6_defconfig
powerpc tqm5200_defconfig
powerpc mpc8560_ads_defconfig
arm vf610m4_defconfig
arm   omap2plus_defconfig
mipsmaltaup_defconfig
sh  ul2_defconfig
um  defconfig
s390  debug_defconfig
sh  sdk7780_defconfig
sparcalldefconfig
arm ezx_defconfig
arcnsimosci_defconfig
mips cu1000-neo_defconfig
powerpcgamecube_defconfig
microblaze  mmu_defconfig
xtensa  cadence_csp_defconfig
armmulti_v7_defconfig
x86_64allnoconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
arc  allyesconfig
nds32 allnoconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
s390 allmodconfig
parisc   allyesconfig
s390defconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
i386 randconfig-a002-20210615
i386 randconfig-a006-20210615
i386 randconfig-a004-20210615
i386 randconfig-a001-20210615
i386 randconfig-a005-20210615
i386 randconfig-a003-20210615
i386 randconfig-a002-20210617
i386 randconfig-a006-20210617
i386 randconfig-a001-20210617
i386 randconfig-a004-20210617
i386 randconfig-a005-20210617
i386 randconfig-a003-20210617
x86_64   randconfig-a004-20210617
x86_64   randconfig-a001-20210617
x86_64   randconfig-a002-20210617
x86_64   randconfig-a003-20210617
x86_64   randconfig-a006-20210617
x86_64   randconfig-a005-20210617
x86_64   randconfig-a015-20210616
x86_64   randconfig-a011-20210616
x86_64   randconfig-a014-20210616
x86_64   randconfig-a012-20210616
x86_64   randconfig-a013-20210616
x86_64   randconfig-a016-20210616
i386 randconfig-a015-20210617
i386

RE: [LKP] Re: [mm/mremap] ecf8443e51: vm-scalability.throughput -29.4% regression

2021-06-16 Thread Liu, Yujie

> -Original Message-
> From: Aneesh Kumar K.V 
> Sent: Tuesday, June 15, 2021 12:09 AM
> To: Sang, Oliver 
> Cc: lkp ; LKML ; 
> l...@lists.01.org; linux...@kvack.org; a...@linux-foundation.org;
> m...@ellerman.id.au; linuxppc-dev@lists.ozlabs.org; kaleshsi...@google.com; 
> npig...@gmail.com; j...@joelfernandes.org; Christophe
> Leroy ; Linus Torvalds 
> ; Kirill A . Shutemov 
> Subject: [LKP] Re: [mm/mremap] ecf8443e51: vm-scalability.throughput -29.4% 
> regression
> 
> On 6/14/21 8:25 PM, kernel test robot wrote:
> >
> >
> > Greeting,
> >
> > FYI, we noticed a -29.4% regression of vm-scalability.throughput due to 
> > commit:
> >
> >
> > commit: ecf8443e51a862b261313c2319ab4e4aed9e6b7e ("[PATCH v7 02/11]
> > mm/mremap: Fix race between MOVE_PUD mremap and pageout")
> > url:
> > https://github.com/0day-ci/linux/commits/Aneesh-Kumar-K-V/Speedup-mrem
> > ap-on-ppc64/20210607-135424
> > base: https://git.kernel.org/cgit/linux/kernel/git/powerpc/linux.git
> > next
> >
> >
> 
> We dropped that approach and is now using 
> https://lore.kernel.org/linux-mm/20210610083549.386085-1-aneesh.ku...@linux.ibm.com
> 
> 
> Instead of pud lock we are now using rmap lock with mremap.
> 
> Can you check with that series?

Hi Aneesh,

Could you please specify the base commit of the patch series? 

We have applied new patch series(rmap lock with mremap) on commit 027f55e87c30 
(tty: hvc: udbg_hvc: retry putc on -EAGAIN), and it shows no regression after 
patch.
=
  
compiler/cpufreq_governor/kconfig/nr_ssd/nr_task/rootfs/runtime/tbox_group/test/testcase/thp_defrag/thp_enabled/uco
de: 
   
  
gcc-9/performance/x86_64-rhel-8.3/1/8/debian-10.4-x86_64-20200603.cgz/300/lkp-csl-2ap1/swap-w-seq-mt/vm-scalabili
ty/always/never/0x4003006   
   

   
commit: 
   
  027f55e87c30 (tty: hvc: udbg_hvc: retry putc on -EAGAIN)<---  patch's 
base commit  
  ecf8443e51a8 (mm/mremap: Fix race between MOVE_PUD mremap and pageout)  
<---  first bad commit  
  57da7477067d  (fixup)<---  apply patch on 027f55e87c30

   
027f55e87c309427 ecf8443e51a862b261313c2319a 57da7477067dbe29247484eda0e
   
 --- ---
   
 %stddev %change %stddev %change %stddev
  
 \  |\  |\  
  
371814 ±  3% -29.1% 263582 ±  3%  -0.1% 371286 ±  2%  
vm-scalability.median   


We also tried to apply patches on commit ecf8443e51a8 (mm/mremap: Fix race 
between MOVE_PUD mremap and pageout), and the regression increased to  -33.9%
=
  
compiler/cpufreq_governor/kconfig/nr_ssd/nr_task/rootfs/runtime/tbox_group/test/testcase/thp_defrag/thp_enabled/uco
de: 
   
  
gcc-9/performance/x86_64-rhel-8.3/1/8/debian-10.4-x86_64-20200603.cgz/300/lkp-csl-2ap1/swap-w-seq-mt/vm-scalabili
ty/always/never/0x4003006   
   

   
commit: 
   
  5f80ee2fc08b (mm/mremap: Fix race between MOVE_PUD mremap and pageout)  
<--- first bad commit's parent  
   
  ecf8443e51a8 (mm/mremap: Fix race between MOVE_PMD mremap and pageout)
<--- first bad commit   
 
  8ae369d45894 (fixup)  <--- apply patch on ecf8443e51a8

   
5f80ee2fc08b3613 ecf8443e51a862b261313c2319a 8ae369d4589492c7f7198cd742d
   
 --- ---
   
  %stddev %change %stddev %change

Re: [PATCH] powerpc/build: vdso linker warning for orphan sections

2021-06-16 Thread Nicholas Piggin

Excerpts from Nicholas Piggin's message of June 11, 2021 9:10 pm:
> Add --orphan-handling=warn for vdsos, and adjust vdso linker scripts to
> deal with orphan sections.
> 
> Signed-off-by: Nicholas Piggin 

Okay it looks like modules should discard .PPC.EMB.apuinfo. Not entirely 
sure about .rela.opd.

Thanks,
Nick

Re: [PATCH v2 6/6] mm/mremap: hold the rmap lock in write mode when moving page table entries.

2021-06-16 Thread Andrew Morton

On Wed, 16 Jun 2021 10:22:39 +0530 "Aneesh Kumar K.V" 
 wrote:

> To avoid a race between rmap walk and mremap, mremap does take_rmap_locks().
> The lock was taken to ensure that rmap walk don't miss a page table entry due 
> to
> PTE moves via move_pagetables(). The kernel does further optimization of
> this lock such that if we are going to find the newly added vma after the
> old vma, the rmap lock is not taken. This is because rmap walk would find the
> vmas in the same order and if we don't find the page table attached to
> older vma we would find it with the new vma which we would iterate later.
> 
> As explained in commit eb66ae030829 ("mremap: properly flush TLB before 
> releasing the page")
> mremap is special in that it doesn't take ownership of the page. The
> optimized version for PUD/PMD aligned mremap also doesn't hold the ptl lock.
> This can result in stale TLB entries as show below.
> 
> ...
>
> Cc: sta...@vger.kernel.org

Sneaking a -stable patch into the middle of all of this was ... sneaky :(

It doesn't actually apply to current mainline either.

I think I'll pretend I didn't notice.  Please sort this out with Greg
when he reports this back to you.

Re: [PATCH 11/11] powerpc/microwatt: Disable interrupts in boot wrapper main program

2021-06-16 Thread Nicholas Piggin

Excerpts from Segher Boessenkool's message of June 17, 2021 9:37 am:
> On Tue, Jun 15, 2021 at 09:05:27AM +1000, Paul Mackerras wrote:
>> This ensures that we don't get a decrementer interrupt arriving before
>> we have set up a handler for it.
> 
> Maybe add a comment saying this is setting MSR[EE]=0 for that?  Or do
> other bits here matter as well?

Hmm, it actually clears MSR[RI] as well.

__hard_irq_disable() is what we want here, unless the MSR[RI] clearing 
is required as well, in which case there is __hard_EE_RI_disable().

Thanks,
Nick

Re: [PATCH v2 0/6] mrermap fixes

2021-06-16 Thread Andrew Morton

On Wed, 16 Jun 2021 10:22:33 +0530 "Aneesh Kumar K.V" 
 wrote:

> This patch series is split out series from [PATCH v7 00/11] Speedup mremap on 
> ppc64
> (https://lore.kernel.org/linux-mm/20210607055131.156184-1-aneesh.ku...@linux.ibm.com)
> dropping ppc64 specific changes.
> 
> This patchset is dependent on
> https://lore.kernel.org/linux-mm/20210615110859.320299-1-aneesh.ku...@linux.ibm.com

Which I just dropped because of all the build breakages :(

Re: [PATCH v12 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

2021-06-16 Thread Stefano Stabellini

On Wed, 16 Jun 2021, Claire Chang wrote:
> Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and
> use it to determine whether to bounce the data or not. This will be
> useful later to allow for different pools.
> 
> Signed-off-by: Claire Chang 
> ---
>  include/linux/swiotlb.h | 11 +++
>  kernel/dma/direct.c |  2 +-
>  kernel/dma/direct.h |  2 +-
>  kernel/dma/swiotlb.c|  4 
>  4 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index dd1c30a83058..8d8855c77d9a 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -84,6 +84,7 @@ extern enum swiotlb_force swiotlb_force;
>   *   unmap calls.
>   * @debugfs: The dentry to debugfs.
>   * @late_alloc:  %true if allocated using the page allocator
> + * @force_bounce: %true if swiotlb bouncing is forced
>   */
>  struct io_tlb_mem {
>   phys_addr_t start;
> @@ -94,6 +95,7 @@ struct io_tlb_mem {
>   spinlock_t lock;
>   struct dentry *debugfs;
>   bool late_alloc;
> + bool force_bounce;
>   struct io_tlb_slot {
>   phys_addr_t orig_addr;
>   size_t alloc_size;
> @@ -109,6 +111,11 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
> phys_addr_t paddr)
>   return mem && paddr >= mem->start && paddr < mem->end;
>  }
>  
> +static inline bool is_swiotlb_force_bounce(struct device *dev)
> +{
> + return dev->dma_io_tlb_mem->force_bounce;
> +}
>  void __init swiotlb_exit(void);
>  unsigned int swiotlb_max_segment(void);
>  size_t swiotlb_max_mapping_size(struct device *dev);
> @@ -120,6 +127,10 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
> phys_addr_t paddr)
>  {
>   return false;
>  }
> +static inline bool is_swiotlb_force_bounce(struct device *dev)
> +{
> + return false;
> +}
>  static inline void swiotlb_exit(void)
>  {
>  }
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 7a88c34d0867..a92465b4eb12 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -496,7 +496,7 @@ size_t dma_direct_max_mapping_size(struct device *dev)
>  {
>   /* If SWIOTLB is active, use its maximum mapping size */
>   if (is_swiotlb_active(dev) &&
> - (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))
> + (dma_addressing_limited(dev) || is_swiotlb_force_bounce(dev)))
>   return swiotlb_max_mapping_size(dev);
>   return SIZE_MAX;
>  }
> diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
> index 13e9e7158d94..4632b0f4f72e 100644
> --- a/kernel/dma/direct.h
> +++ b/kernel/dma/direct.h
> @@ -87,7 +87,7 @@ static inline dma_addr_t dma_direct_map_page(struct device 
> *dev,
>   phys_addr_t phys = page_to_phys(page) + offset;
>   dma_addr_t dma_addr = phys_to_dma(dev, phys);
>  
> - if (unlikely(swiotlb_force == SWIOTLB_FORCE))
> + if (is_swiotlb_force_bounce(dev))
>   return swiotlb_map(dev, phys, size, dir, attrs);
>
>   if (unlikely(!dma_capable(dev, dma_addr, size, true))) {

Should we also make the same change in
drivers/xen/swiotlb-xen.c:xen_swiotlb_map_page ?

If I make that change, I can see that everything is working as
expected for a restricted-dma device with Linux running as dom0 on Xen.
However, is_swiotlb_force_bounce returns non-zero even for normal
non-restricted-dma devices. That shouldn't happen, right?

It looks like struct io_tlb_slot is not zeroed on allocation.
Adding memset(mem, 0x0, struct_size) in swiotlb_late_init_with_tbl
solves the issue.

With those two changes, the series passes my tests and you can add my
tested-by.

Re: [PATCH v4 2/4] lazy tlb: allow lazy tlb mm refcounting to be configurable

2021-06-16 Thread Nicholas Piggin

Excerpts from Nicholas Piggin's message of June 16, 2021 11:02 am:
> Excerpts from Andy Lutomirski's message of June 16, 2021 10:14 am:
>> akpm, please drop this series until it's fixed.  It's a core change to
>> better support arch usecases, but it's unnecessarily fragile, and there
>> is already an arch maintainer pointing out that it's inadequate to
>> robustly support arch usecases.  There is no reason to merge it in its
>> present state.

Just to make sure I'm not doing anything stupid or fragile for other 
archs, I had a closer look at a few. sparc32 is the only one I have a 
SMP capable qemu and initramfs at hand for, took about 5 minutes to 
convert after fixing 2 other sparc32/mm bugs (patches on linux-sparc),
one of them found by the DEBUG_VM code my series added. It seems to work 
fine, with what little stressing my qemu setup can muster.

Simple. Robust. Pretty mechanical conversion follows the documented 
reciple. Re-uses every single line of code I added outside 
arch/powerpc/. Requires no elaborate dances.

alpha and arm64 are both 4-liners by the looks, sparc64 might reqiure a 
bit of actual code but doesn't look too hard.

So I'm satisfied the code added outside arch/powerpc/ is not some 
fragile powerpc specific hack. I don't know if other archs will use 
it, but they easily can use it[*].

And we can make changes to help x86 whenever its needed -- I already 
posted patch 1/n for configuring out lazy tlb and active_mm from core 
code rebased on top of mmotm so the series is not preventing such 
changes.

Hopefully this allays some concerns.

[*] I do think mmgrab_lazy_tlb is a nice change that self-documents the 
active_mm refcounting, so I will try to get all the arch code 
converted to use it over the next few releases, even if they never
switch to use lazy tlb shootdown.

Thanks,
Nick

---
 arch/sparc/Kconfig| 1 +
 arch/sparc/kernel/leon_smp.c  | 2 +-
 arch/sparc/kernel/smp_64.c| 2 +-
 arch/sparc/kernel/sun4d_smp.c | 2 +-
 arch/sparc/kernel/sun4m_smp.c | 2 +-
 arch/sparc/kernel/traps_32.c  | 2 +-
 arch/sparc/kernel/traps_64.c  | 2 +-
 7 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 164a5254c91c..db9954af57a2 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -58,6 +58,7 @@ config SPARC32
select GENERIC_ATOMIC64
select CLZ_TAB
select HAVE_UID16
+   select MMU_LAZY_TLB_SHOOTDOWN
select OLD_SIGACTION
 
 config SPARC64
diff --git a/arch/sparc/kernel/leon_smp.c b/arch/sparc/kernel/leon_smp.c
index 1eed26d423fb..d00460788048 100644
--- a/arch/sparc/kernel/leon_smp.c
+++ b/arch/sparc/kernel/leon_smp.c
@@ -92,7 +92,7 @@ void leon_cpu_pre_online(void *arg)
 : "memory" /* paranoid */);
 
/* Attach to the address space of init_task. */
-   mmgrab(_mm);
+   mmgrab_lazy_tlb(_mm);
current->active_mm = _mm;
 
while (!cpumask_test_cpu(cpuid, _commenced_mask))
diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c
index e38d8bf454e8..19aa12991f2b 100644
--- a/arch/sparc/kernel/smp_64.c
+++ b/arch/sparc/kernel/smp_64.c
@@ -127,7 +127,7 @@ void smp_callin(void)
current_thread_info()->new_child = 0;
 
/* Attach to the address space of init_task. */
-   mmgrab(_mm);
+   mmgrab_lazy_tlb(_mm);
current->active_mm = _mm;
 
/* inform the notifiers about the new cpu */
diff --git a/arch/sparc/kernel/sun4d_smp.c b/arch/sparc/kernel/sun4d_smp.c
index ff30f03beb7c..a6f392dcfeaf 100644
--- a/arch/sparc/kernel/sun4d_smp.c
+++ b/arch/sparc/kernel/sun4d_smp.c
@@ -94,7 +94,7 @@ void sun4d_cpu_pre_online(void *arg)
show_leds(cpuid);
 
/* Attach to the address space of init_task. */
-   mmgrab(_mm);
+   mmgrab_lazy_tlb(_mm);
current->active_mm = _mm;
 
local_ops->cache_all();
diff --git a/arch/sparc/kernel/sun4m_smp.c b/arch/sparc/kernel/sun4m_smp.c
index 228a6527082d..0ee77f066c9e 100644
--- a/arch/sparc/kernel/sun4m_smp.c
+++ b/arch/sparc/kernel/sun4m_smp.c
@@ -60,7 +60,7 @@ void sun4m_cpu_pre_online(void *arg)
 : "memory" /* paranoid */);
 
/* Attach to the address space of init_task. */
-   mmgrab(_mm);
+   mmgrab_lazy_tlb(_mm);
current->active_mm = _mm;
 
while (!cpumask_test_cpu(cpuid, _commenced_mask))
diff --git a/arch/sparc/kernel/traps_32.c b/arch/sparc/kernel/traps_32.c
index 247a0d9683b2..a3186bb30109 100644
--- a/arch/sparc/kernel/traps_32.c
+++ b/arch/sparc/kernel/traps_32.c
@@ -387,7 +387,7 @@ void trap_init(void)
thread_info_offsets_are_bolixed_pete();
 
/* Attach to the address space of init_task. */
-   mmgrab(_mm);
+   mmgrab_lazy_tlb(_mm);
current->active_mm = _mm;
 
/* NOTE: Other cpus have this done as they are started
diff --git a/arch/sparc/kernel/traps_64.c b/arch/sparc/kernel/traps_64.c
index

Re: [PATCH v12 11/12] dt-bindings: of: Add restricted DMA pool

2021-06-16 Thread Stefano Stabellini

On Wed, 16 Jun 2021, Claire Chang wrote:
> Introduce the new compatible string, restricted-dma-pool, for restricted
> DMA. One can specify the address and length of the restricted DMA memory
> region by restricted-dma-pool in the reserved-memory node.
> 
> Signed-off-by: Claire Chang 
> ---
>  .../reserved-memory/reserved-memory.txt   | 36 +--
>  1 file changed, 33 insertions(+), 3 deletions(-)
> 
> diff --git 
> a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt 
> b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> index e8d3096d922c..46804f24df05 100644
> --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
> @@ -51,6 +51,23 @@ compatible (optional) - standard definition
>used as a shared pool of DMA buffers for a set of devices. It can
>be used by an operating system to instantiate the necessary pool
>management subsystem if necessary.
> +- restricted-dma-pool: This indicates a region of memory meant to be
> +  used as a pool of restricted DMA buffers for a set of devices. The
> +  memory region would be the only region accessible to those devices.
> +  When using this, the no-map and reusable properties must not be 
> set,
> +  so the operating system can create a virtual mapping that will be 
> used
> +  for synchronization. The main purpose for restricted DMA is to
> +  mitigate the lack of DMA access control on systems without an 
> IOMMU,
> +  which could result in the DMA accessing the system memory at
> +  unexpected times and/or unexpected addresses, possibly leading to 
> data
> +  leakage or corruption. The feature on its own provides a basic 
> level
> +  of protection against the DMA overwriting buffer contents at
> +  unexpected times. However, to protect against general data leakage 
> and
> +  system memory corruption, the system needs to provide way to lock 
> down
> +  the memory access, e.g., MPU. Note that since coherent allocation
> +  needs remapping, one must set up another device coherent pool by
> +  shared-dma-pool and use dma_alloc_from_dev_coherent instead for 
> atomic
> +  coherent allocation.
>  - vendor specific string in the form ,[-]
>  no-map (optional) - empty property
>  - Indicates the operating system must not create a virtual mapping
> @@ -85,10 +102,11 @@ memory-region-names (optional) - a list of names, one 
> for each corresponding
>  
>  Example
>  ---
> -This example defines 3 contiguous regions are defined for Linux kernel:
> +This example defines 4 contiguous regions for Linux kernel:
>  one default of all device drivers (named linux,cma@7200 and 64MiB in 
> size),
> -one dedicated to the framebuffer device (named framebuffer@7800, 8MiB), 
> and
> -one for multimedia processing (named multimedia-memory@7700, 64MiB).
> +one dedicated to the framebuffer device (named framebuffer@7800, 8MiB),
> +one for multimedia processing (named multimedia-memory@7700, 64MiB), and
> +one for restricted dma pool (named restricted_dma_reserved@0x5000, 
> 64MiB).
>  
>  / {
>   #address-cells = <1>;
> @@ -120,6 +138,11 @@ one for multimedia processing (named 
> multimedia-memory@7700, 64MiB).
>   compatible = "acme,multimedia-memory";
>   reg = <0x7700 0x400>;
>   };
> +
> + restricted_dma_reserved: restricted_dma_reserved {
> + compatible = "restricted-dma-pool";
> + reg = <0x5000 0x400>;
> + };
>   };
>  
>   /* ... */
> @@ -138,4 +161,11 @@ one for multimedia processing (named 
> multimedia-memory@7700, 64MiB).
>   memory-region = <_reserved>;
>   /* ... */
>   };
> +
> + pcie_device: pcie_device@0,0 {
> + reg = <0x8301 0x0 0x 0x0 0x0010
> +0x8301 0x0 0x0010 0x0 0x0010>;
> + memory-region = <_dma_mem_reserved>;

Shouldn't it be _dma_reserved ?

Re: [PATCH 8/8] membarrier: Rewrite sync_core_before_usermode() and improve documentation

2021-06-16 Thread Andy Lutomirski

On Wed, Jun 16, 2021, at 3:20 AM, Will Deacon wrote:
> 
> For the arm64 bits (docs and asm/sync_core.h):
> 
> Acked-by: Will Deacon 
> 

Thanks.

Per Nick's suggestion, I renamed the header to membarrier.h.  Unless I hear 
otherwise, I'll keep the ack.

> Will
>

Re: [PATCH 8/8] membarrier: Rewrite sync_core_before_usermode() and improve documentation

2021-06-16 Thread Andy Lutomirski

On Wed, Jun 16, 2021, at 11:52 AM, Andy Lutomirski wrote:
> On 6/15/21 9:45 PM, Nicholas Piggin wrote:
> > Excerpts from Andy Lutomirski's message of June 16, 2021 1:21 pm:
> >> The old sync_core_before_usermode() comments suggested that a 
> >> non-icache-syncing
> >> return-to-usermode instruction is x86-specific and that all other
> >> architectures automatically notice cross-modified code on return to
> >> userspace.
> 
> >> +/*
> >> + * XXX: can a powerpc person put an appropriate comment here?
> >> + */
> >> +static inline void membarrier_sync_core_before_usermode(void)
> >> +{
> >> +}
> >> +
> >> +#endif /* _ASM_POWERPC_SYNC_CORE_H */
> > 
> > powerpc's can just go in asm/membarrier.h
> 
> $ ls arch/powerpc/include/asm/membarrier.h
> ls: cannot access 'arch/powerpc/include/asm/membarrier.h': No such file
> or directory

Which is because I deleted it.  Duh.  I'll clean this up.

> 
> 
> > 
> > /*
> >  * The RFI family of instructions are context synchronising, and
> >  * that is how we return to userspace, so nothing is required here.
> >  */
> 
> Thanks!
>

Re: [PATCH 11/11] powerpc/microwatt: Disable interrupts in boot wrapper main program

2021-06-16 Thread Segher Boessenkool

On Tue, Jun 15, 2021 at 09:05:27AM +1000, Paul Mackerras wrote:
> This ensures that we don't get a decrementer interrupt arriving before
> we have set up a handler for it.

Maybe add a comment saying this is setting MSR[EE]=0 for that?  Or do
other bits here matter as well?


Segher

Re: [PATCH 01/11] powerpc: Add Microwatt platform

2021-06-16 Thread Paul Mackerras

On Wed, Jun 16, 2021 at 01:40:07PM -0500, Segher Boessenkool wrote:
> Hi Paul,
> 
> On Tue, Jun 15, 2021 at 08:57:43AM +1000, Paul Mackerras wrote:
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -422,7 +422,7 @@ config HUGETLB_PAGE_SIZE_VARIABLE
> >  
> >  config MATH_EMULATION
> > bool "Math emulation"
> > -   depends on 4xx || PPC_8xx || PPC_MPC832x || BOOKE
> > +   depends on 4xx || PPC_8xx || PPC_MPC832x || BOOKE || PPC_MICROWATT
> > select PPC_FPU_REGS
> 
> Why do you need this / want this, since you have FP hardware?

The FPU is optional, and doesn't fit in the smaller (-35T) version of
the Artix-7 that is readily available.

I should mention this in the commit message.

Paul.

[to-be-updated] mm-rename-p4d_page_vaddr-to-p4d_pgtable-and-make-it-return-pud_t.patch removed from -mm tree

2021-06-16 Thread akpm



The patch titled
 Subject: mm: rename p4d_page_vaddr to p4d_pgtable and make it return pud_t 
*
has been removed from the -mm tree.  Its filename was
 mm-rename-p4d_page_vaddr-to-p4d_pgtable-and-make-it-return-pud_t.patch

This patch was dropped because an updated version will be merged

--
From: "Aneesh Kumar K.V" 
Subject: mm: rename p4d_page_vaddr to p4d_pgtable and make it return pud_t *

No functional change in this patch.

Link: 
https://lkml.kernel.org/r/20210615110859.320299-2-aneesh.ku...@linux.ibm.com
Link: 
https://lore.kernel.org/linuxppc-dev/CAHk-=wi+j+iodze9ftjm3zi4j4oes+qqbkxme9qn4roxpex...@mail.gmail.com/
Signed-off-by: Aneesh Kumar K.V 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
Signed-off-by: Andrew Morton 
---

 arch/arm64/include/asm/pgtable.h|4 ++--
 arch/ia64/include/asm/pgtable.h |2 +-
 arch/mips/include/asm/pgtable-64.h  |4 ++--
 arch/powerpc/include/asm/book3s/64/pgtable.h|5 -
 arch/powerpc/include/asm/nohash/64/pgtable-4k.h |6 +-
 arch/powerpc/mm/book3s64/radix_pgtable.c|2 +-
 arch/powerpc/mm/pgtable_64.c|2 +-
 arch/sparc/include/asm/pgtable_64.h |4 ++--
 arch/x86/include/asm/pgtable.h  |4 ++--
 arch/x86/mm/init_64.c   |4 ++--
 include/asm-generic/pgtable-nop4d.h |2 +-
 include/asm-generic/pgtable-nopud.h |2 +-
 include/linux/pgtable.h |2 +-
 13 files changed, 25 insertions(+), 18 deletions(-)

--- 
a/arch/arm64/include/asm/pgtable.h~mm-rename-p4d_page_vaddr-to-p4d_pgtable-and-make-it-return-pud_t
+++ a/arch/arm64/include/asm/pgtable.h
@@ -694,9 +694,9 @@ static inline phys_addr_t p4d_page_paddr
return __p4d_to_phys(p4d);
 }
 
-static inline unsigned long p4d_page_vaddr(p4d_t p4d)
+static inline pud_t *p4d_pgtable(p4d_t p4d)
 {
-   return (unsigned long)__va(p4d_page_paddr(p4d));
+   return (pud_t *)__va(p4d_page_paddr(p4d));
 }
 
 /* Find an entry in the frst-level page table. */
--- 
a/arch/ia64/include/asm/pgtable.h~mm-rename-p4d_page_vaddr-to-p4d_pgtable-and-make-it-return-pud_t
+++ a/arch/ia64/include/asm/pgtable.h
@@ -281,7 +281,7 @@ ia64_phys_addr_valid (unsigned long addr
 #define p4d_bad(p4d)   (!ia64_phys_addr_valid(p4d_val(p4d)))
 #define p4d_present(p4d)   (p4d_val(p4d) != 0UL)
 #define p4d_clear(p4dp)(p4d_val(*(p4dp)) = 0UL)
-#define p4d_page_vaddr(p4d)((unsigned long) __va(p4d_val(p4d) & 
_PFN_MASK))
+#define p4d_pgtable(p4d)   ((pud_t *) __va(p4d_val(p4d) & 
_PFN_MASK))
 #define p4d_page(p4d)  virt_to_page((p4d_val(p4d) + 
PAGE_OFFSET))
 #endif
 
--- 
a/arch/mips/include/asm/pgtable-64.h~mm-rename-p4d_page_vaddr-to-p4d_pgtable-and-make-it-return-pud_t
+++ a/arch/mips/include/asm/pgtable-64.h
@@ -209,9 +209,9 @@ static inline void p4d_clear(p4d_t *p4dp
p4d_val(*p4dp) = (unsigned long)invalid_pud_table;
 }
 
-static inline unsigned long p4d_page_vaddr(p4d_t p4d)
+static inline pud_t *p4d_pgtable(p4d_t p4d)
 {
-   return p4d_val(p4d);
+   return (pud_t *)p4d_val(p4d);
 }
 
 #define p4d_phys(p4d)  virt_to_phys((void *)p4d_val(p4d))
--- 
a/arch/powerpc/include/asm/book3s/64/pgtable.h~mm-rename-p4d_page_vaddr-to-p4d_pgtable-and-make-it-return-pud_t
+++ a/arch/powerpc/include/asm/book3s/64/pgtable.h
@@ -1048,7 +1048,10 @@ extern struct page *p4d_page(p4d_t p4d);
 /* Pointers in the page table tree are physical addresses */
 #define __pgtable_ptr_val(ptr) __pa(ptr)
 
-#define p4d_page_vaddr(p4d)__va(p4d_val(p4d) & ~P4D_MASKED_BITS)
+static inline pud_t *p4d_pgtable(p4d_t p4d)
+{
+   return (pud_t *)__va(p4d_val(p4d) & ~P4D_MASKED_BITS);
+}
 
 static inline pmd_t *pud_pgtable(pud_t pud)
 {
--- 
a/arch/powerpc/include/asm/nohash/64/pgtable-4k.h~mm-rename-p4d_page_vaddr-to-p4d_pgtable-and-make-it-return-pud_t
+++ a/arch/powerpc/include/asm/nohash/64/pgtable-4k.h
@@ -56,10 +56,14 @@
 #define p4d_none(p4d)  (!p4d_val(p4d))
 #define p4d_bad(p4d)   (p4d_val(p4d) == 0)
 #define p4d_present(p4d)   (p4d_val(p4d) != 0)
-#define p4d_page_vaddr(p4d)(p4d_val(p4d) & ~P4D_MASKED_BITS)
 
 #ifndef __ASSEMBLY__
 
+static inline pud_t *p4d_pgtable(p4d_t p4d)
+{
+   return (pud_t *) (p4d_val(p4d) & ~P4D_MASKED_BITS);
+}
+
 static inline void p4d_clear(p4d_t *p4dp)
 {
*p4dp = __p4d(0);
--- 
a/arch/powerpc/mm/book3s64/radix_pgtable.c~mm-rename-p4d_page_vaddr-to-p4d_pgtable-and-make-it-return-pud_t
+++ a/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -860,7 +860,7 @@ static void __meminit remove_pagetable(u
continue;
}
 
-   pud_base = (pud_t *)p4d_page_vaddr(*p4d);
+   pud_base = p4d_pgtable(*p4d);
remove_pud_table(pud_base, addr,

[to-be-updated] mm-rename-pud_page_vaddr-to-pud_pgtable-and-make-it-return-pmd_t.patch removed from -mm tree

2021-06-16 Thread akpm



The patch titled
 Subject: mm: rename pud_page_vaddr to pud_pgtable and make it return pmd_t 
*
has been removed from the -mm tree.  Its filename was
 mm-rename-pud_page_vaddr-to-pud_pgtable-and-make-it-return-pmd_t.patch

This patch was dropped because an updated version will be merged

--
From: "Aneesh Kumar K.V" 
Subject: mm: rename pud_page_vaddr to pud_pgtable and make it return pmd_t *

No functional change in this patch.

Link: 
https://lkml.kernel.org/r/20210615110859.320299-1-aneesh.ku...@linux.ibm.com
Link: 
https://lore.kernel.org/linuxppc-dev/CAHk-=wi+j+iodze9ftjm3zi4j4oes+qqbkxme9qn4roxpex...@mail.gmail.com/
Signed-off-by: Aneesh Kumar K.V 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
Cc: 
Signed-off-by: Andrew Morton 
---

 arch/alpha/include/asm/pgtable.h |8 +---
 arch/arm/include/asm/pgtable-3level.h|2 +-
 arch/arm64/include/asm/pgtable.h |4 ++--
 arch/ia64/include/asm/pgtable.h  |2 +-
 arch/m68k/include/asm/motorola_pgtable.h |2 +-
 arch/mips/include/asm/pgtable-64.h   |4 ++--
 arch/parisc/include/asm/pgtable.h|4 ++--
 arch/powerpc/include/asm/book3s/64/pgtable.h |6 +-
 arch/powerpc/include/asm/nohash/64/pgtable.h |6 +-
 arch/powerpc/mm/book3s64/radix_pgtable.c |4 ++--
 arch/powerpc/mm/pgtable_64.c |2 +-
 arch/riscv/include/asm/pgtable-64.h  |4 ++--
 arch/sh/include/asm/pgtable-3level.h |4 ++--
 arch/sparc/include/asm/pgtable_32.h  |4 ++--
 arch/sparc/include/asm/pgtable_64.h  |6 +++---
 arch/um/include/asm/pgtable-3level.h |2 +-
 arch/x86/include/asm/pgtable.h   |4 ++--
 arch/x86/mm/pat/set_memory.c |4 ++--
 arch/x86/mm/pgtable.c|2 +-
 include/asm-generic/pgtable-nopmd.h  |2 +-
 include/asm-generic/pgtable-nopud.h  |2 +-
 include/linux/pgtable.h  |2 +-
 22 files changed, 45 insertions(+), 35 deletions(-)

--- 
a/arch/alpha/include/asm/pgtable.h~mm-rename-pud_page_vaddr-to-pud_pgtable-and-make-it-return-pmd_t
+++ a/arch/alpha/include/asm/pgtable.h
@@ -236,8 +236,10 @@ pmd_page_vaddr(pmd_t pmd)
 #define pmd_page(pmd)  (pfn_to_page(pmd_val(pmd) >> 32))
 #define pud_page(pud)  (pfn_to_page(pud_val(pud) >> 32))
 
-extern inline unsigned long pud_page_vaddr(pud_t pgd)
-{ return PAGE_OFFSET + ((pud_val(pgd) & _PFN_MASK) >> (32-PAGE_SHIFT)); }
+static inline pmd_t *pud_pgtable(pud_t pgd)
+{
+   return (pmd_t *)(PAGE_OFFSET + ((pud_val(pgd) & _PFN_MASK) >> 
(32-PAGE_SHIFT)));
+}
 
 extern inline int pte_none(pte_t pte)  { return !pte_val(pte); }
 extern inline int pte_present(pte_t pte)   { return pte_val(pte) & 
_PAGE_VALID; }
@@ -287,7 +289,7 @@ extern inline pte_t pte_mkyoung(pte_t pt
 /* Find an entry in the second-level page table.. */
 extern inline pmd_t * pmd_offset(pud_t * dir, unsigned long address)
 {
-   pmd_t *ret = (pmd_t *) pud_page_vaddr(*dir) + ((address >> PMD_SHIFT) & 
(PTRS_PER_PAGE - 1));
+   pmd_t *ret = pud_pgtable(*dir) + ((address >> PMD_SHIFT) & 
(PTRS_PER_PAGE - 1));
smp_rmb(); /* see above */
return ret;
 }
--- 
a/arch/arm64/include/asm/pgtable.h~mm-rename-pud_page_vaddr-to-pud_pgtable-and-make-it-return-pmd_t
+++ a/arch/arm64/include/asm/pgtable.h
@@ -633,9 +633,9 @@ static inline phys_addr_t pud_page_paddr
return __pud_to_phys(pud);
 }
 
-static inline unsigned long pud_page_vaddr(pud_t pud)
+static inline pmd_t *pud_pgtable(pud_t pud)
 {
-   return (unsigned long)__va(pud_page_paddr(pud));
+   return (pmd_t *)__va(pud_page_paddr(pud));
 }
 
 /* Find an entry in the second-level page table. */
--- 
a/arch/arm/include/asm/pgtable-3level.h~mm-rename-pud_page_vaddr-to-pud_pgtable-and-make-it-return-pmd_t
+++ a/arch/arm/include/asm/pgtable-3level.h
@@ -130,7 +130,7 @@
flush_pmd_entry(pudp);  \
} while (0)
 
-static inline pmd_t *pud_page_vaddr(pud_t pud)
+static inline pmd_t *pud_pgtable(pud_t pud)
 {
return __va(pud_val(pud) & PHYS_MASK & (s32)PAGE_MASK);
 }
--- 
a/arch/ia64/include/asm/pgtable.h~mm-rename-pud_page_vaddr-to-pud_pgtable-and-make-it-return-pmd_t
+++ a/arch/ia64/include/asm/pgtable.h
@@ -273,7 +273,7 @@ ia64_phys_addr_valid (unsigned long addr
 #define pud_bad(pud)   (!ia64_phys_addr_valid(pud_val(pud)))
 #define pud_present(pud)   (pud_val(pud) != 0UL)
 #define pud_clear(pudp)(pud_val(*(pudp)) = 0UL)
-#define pud_page_vaddr(pud)((unsigned long) __va(pud_val(pud) & 
_PFN_MASK))
+#define pud_pgtable(pud)   ((pmd_t *) __va(pud_val(pud) & 
_PFN_MASK))
 #define pud_page(pud)  virt_to_page((pud_val(pud) + 
PAGE_OFFSET))
 
 #if CONFIG_PGTABLE_LEVELS == 4
---

Re: [PATCH 07/11] powerpc: Add support for microwatt's hardware random number generator

2021-06-16 Thread Paul Mackerras

On Wed, Jun 16, 2021 at 11:16:02PM +1000, Michael Ellerman wrote:
> Nicholas Piggin  writes:
> > I would be happier if you didn't change this (or at least put it in its 
> > own patch explaining why it's not going to slow down other platforms).
> 
> It would essentially be a revert of 01c9348c7620 ("powerpc: Use hardware
> RNG for arch_get_random_seed_* not arch_get_random_*")
> 
> Which would be ironic :)

You expect me to remember things I did 6 years ago? :)

I'll take this part out.  My thinking originally was that since darn
on Microwatt is a single-cycle instruction, it would be faster to use
darn every time rather than run a software PRNG seeded from darn.
It's not critical though.

Paul.

Re: [PATCH 06/11] powerpc: microwatt: Use standard 16550 UART for console

2021-06-16 Thread Segher Boessenkool

On Tue, Jun 15, 2021 at 09:01:27AM +1000, Paul Mackerras wrote:
> This adds support to the Microwatt platform to use the standard
> 1655-style UART which available in the standalone Microwatt FPGA.

16550... 1655 is a DAC apparently :-)


Segher

[PATCH] crypto: nx: Fix memcpy() over-reading in nonce

2021-06-16 Thread Kees Cook

Fix typo in memcpy() where size should be CTR_RFC3686_NONCE_SIZE.

Fixes: 030f4e968741 ("crypto: nx - Fix reentrancy bugs")
Cc: sta...@vger.kernel.org
Signed-off-by: Kees Cook 
---
 drivers/crypto/nx/nx-aes-ctr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/nx/nx-aes-ctr.c b/drivers/crypto/nx/nx-aes-ctr.c
index 13f518802343..6120e350ff71 100644
--- a/drivers/crypto/nx/nx-aes-ctr.c
+++ b/drivers/crypto/nx/nx-aes-ctr.c
@@ -118,7 +118,7 @@ static int ctr3686_aes_nx_crypt(struct skcipher_request 
*req)
struct nx_crypto_ctx *nx_ctx = crypto_skcipher_ctx(tfm);
u8 iv[16];
 
-   memcpy(iv, nx_ctx->priv.ctr.nonce, CTR_RFC3686_IV_SIZE);
+   memcpy(iv, nx_ctx->priv.ctr.nonce, CTR_RFC3686_NONCE_SIZE);
memcpy(iv + CTR_RFC3686_NONCE_SIZE, req->iv, CTR_RFC3686_IV_SIZE);
iv[12] = iv[13] = iv[14] = 0;
iv[15] = 1;
-- 
2.25.1

Re: [PATCH 8/8] membarrier: Rewrite sync_core_before_usermode() and improve documentation

2021-06-16 Thread Andy Lutomirski

On 6/15/21 9:45 PM, Nicholas Piggin wrote:
> Excerpts from Andy Lutomirski's message of June 16, 2021 1:21 pm:
>> The old sync_core_before_usermode() comments suggested that a 
>> non-icache-syncing
>> return-to-usermode instruction is x86-specific and that all other
>> architectures automatically notice cross-modified code on return to
>> userspace.

>> +/*
>> + * XXX: can a powerpc person put an appropriate comment here?
>> + */
>> +static inline void membarrier_sync_core_before_usermode(void)
>> +{
>> +}
>> +
>> +#endif /* _ASM_POWERPC_SYNC_CORE_H */
> 
> powerpc's can just go in asm/membarrier.h

$ ls arch/powerpc/include/asm/membarrier.h
ls: cannot access 'arch/powerpc/include/asm/membarrier.h': No such file
or directory


> 
> /*
>  * The RFI family of instructions are context synchronising, and
>  * that is how we return to userspace, so nothing is required here.
>  */

Thanks!

Re: [PATCH 01/11] powerpc: Add Microwatt platform

2021-06-16 Thread Segher Boessenkool

Hi Paul,

On Tue, Jun 15, 2021 at 08:57:43AM +1000, Paul Mackerras wrote:
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -422,7 +422,7 @@ config HUGETLB_PAGE_SIZE_VARIABLE
>  
>  config MATH_EMULATION
>   bool "Math emulation"
> - depends on 4xx || PPC_8xx || PPC_MPC832x || BOOKE
> + depends on 4xx || PPC_8xx || PPC_MPC832x || BOOKE || PPC_MICROWATT
>   select PPC_FPU_REGS

Why do you need this / want this, since you have FP hardware?


Segher

Re: [PATCH v13 3/3] kasan: define and use MAX_PTRS_PER_* for early shadow tables

2021-06-16 Thread Christophe Leroy





Le 16/06/2021 à 11:07, Marco Elver a écrit :

On Wed, 16 Jun 2021 at 10:03, Daniel Axtens  wrote:
[...]

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 768d7d342757..fd65f477ac92 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -40,10 +40,22 @@ struct kunit_kasan_expectation {
  #define PTE_HWTABLE_PTRS 0
  #endif

+#ifndef MAX_PTRS_PER_PTE
+#define MAX_PTRS_PER_PTE PTRS_PER_PTE
+#endif
+
+#ifndef MAX_PTRS_PER_PMD
+#define MAX_PTRS_PER_PMD PTRS_PER_PMD
+#endif
+
+#ifndef MAX_PTRS_PER_PUD
+#define MAX_PTRS_PER_PUD PTRS_PER_PUD
+#endif


This is introducing new global constants in a  header. It
feels like this should be in  together with a
comment. Because  is actually included in
, most of the kernel will get these new definitions.
That in itself is fine, but it feels wrong that the KASAN header
introduces these.

Thoughts?

Sorry for only realizing this now.


My idea here was to follow the same road as MAX_PTRS_PER_P4D, added by commit 
https://github.com/linuxppc/linux/commit/c65e774f


That commit spread MAX_PTRS_PER_P4D everywhere.

Instead of doing the same, we found that it would be better to define a fallback for when the 
architecture doesn't define MAX_PTRS_PER_PxD . Now, it can be made more global in pgtable.h, in that 
case I'd suggest to also include MAX_PTRS_PER_P4D in the dance and avoid architectures like s390 
having to define it, or even not defining it either in asm-generic/pgtable-nop4d.h


Christophe



Thanks,
-- Marco


  extern unsigned char kasan_early_shadow_page[PAGE_SIZE];
-extern pte_t kasan_early_shadow_pte[PTRS_PER_PTE + PTE_HWTABLE_PTRS];
-extern pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD];
-extern pud_t kasan_early_shadow_pud[PTRS_PER_PUD];
+extern pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE + PTE_HWTABLE_PTRS];
+extern pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD];
+extern pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD];
  extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D];

  int kasan_populate_early_shadow(const void *shadow_start,
diff --git a/mm/kasan/init.c b/mm/kasan/init.c
index 348f31d15a97..cc64ed6858c6 100644
--- a/mm/kasan/init.c
+++ b/mm/kasan/init.c
@@ -41,7 +41,7 @@ static inline bool kasan_p4d_table(pgd_t pgd)
  }
  #endif
  #if CONFIG_PGTABLE_LEVELS > 3
-pud_t kasan_early_shadow_pud[PTRS_PER_PUD] __page_aligned_bss;
+pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD] __page_aligned_bss;
  static inline bool kasan_pud_table(p4d_t p4d)
  {
 return p4d_page(p4d) == virt_to_page(lm_alias(kasan_early_shadow_pud));
@@ -53,7 +53,7 @@ static inline bool kasan_pud_table(p4d_t p4d)
  }
  #endif
  #if CONFIG_PGTABLE_LEVELS > 2
-pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD] __page_aligned_bss;
+pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD] __page_aligned_bss;
  static inline bool kasan_pmd_table(pud_t pud)
  {
 return pud_page(pud) == virt_to_page(lm_alias(kasan_early_shadow_pmd));
@@ -64,7 +64,7 @@ static inline bool kasan_pmd_table(pud_t pud)
 return false;
  }
  #endif
-pte_t kasan_early_shadow_pte[PTRS_PER_PTE + PTE_HWTABLE_PTRS]
+pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE + PTE_HWTABLE_PTRS]
 __page_aligned_bss;

  static inline bool kasan_pte_table(pmd_t pmd)
--
2.30.2

--
You received this message because you are subscribed to the Google Groups 
"kasan-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to kasan-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/kasan-dev/20210616080244.51236-4-dja%40axtens.net.

Re: [dm-devel] [PATCH 06/18] bvec: add a bvec_kmap_local helper

2021-06-16 Thread Bart Van Assche

On 6/15/21 6:24 AM, Christoph Hellwig wrote:
> +/**
> + * bvec_kmap_local - map a bvec into the kernel virtual address space
> + * @bvec: bvec to map
> + *
> + * Must be called on single-page bvecs only.  Call kunmap_local on the 
> returned
> + * address to unmap.
> + */
> +static inline void *bvec_kmap_local(struct bio_vec *bvec)
> +{
> + return kmap_local_page(bvec->bv_page) + bvec->bv_offset;
> +}

Hi Christoph,

Would it be appropriate to add WARN_ON_ONCE(bvec->bv_offset >=
PAGE_SIZE) in this function?

Thanks,

Bart.

Re: switch the block layer to use kmap_local_page v2

2021-06-16 Thread Martin K. Petersen



Christoph,

> this series switches the core block layer code and all users of the
> existing bvec kmap helpers to use kmap_local_page.  Drivers that
> currently use open coded kmap_atomic calls will converted in a follow
> on series.

Looks OK to me.

Reviewed-by: Martin K. Petersen 

-- 
Martin K. Petersen  Oracle Linux Engineering

Re: [PATCH 11/18] ps3disk: use memcpy_{from,to}_bvec

2021-06-16 Thread Geoff Levand

Hi Christoph,

On 6/15/21 6:24 AM, Christoph Hellwig wrote:
> Use the bvec helpers instead of open coding the copy.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/block/ps3disk.c | 19 +++
>  1 file changed, 3 insertions(+), 16 deletions(-)

I tested your patch set applied to v5.13-rc6 on PS3 and it seemed to be
working OK.

I did some rsync's, some dd's, some fsck's, etc.  If you have anything
you could suggest that you think would exercise your changes I could
try that also.

Tested-by: Geoff Levand

Re: [PATCH v19 05/13] of: Add a common kexec FDT setup function

2021-06-16 Thread Rob Herring

On Tue, Jun 15, 2021 at 8:23 PM Michael Ellerman  wrote:
>
> Rob Herring  writes:
> > On Tue, Jun 15, 2021 at 10:13 AM nramas  wrote:
> >>
> >> On Tue, 2021-06-15 at 08:01 -0600, Rob Herring wrote:
> >> > On Tue, Jun 15, 2021 at 6:18 AM Geert Uytterhoeven <
> >> > ge...@linux-m68k.org> wrote:
> >> > >
> >> > > > +void *of_kexec_alloc_and_setup_fdt(const struct kimage *image,
> >> > > > +  unsigned long
> >> > > > initrd_load_addr,
> >> > > > +  unsigned long initrd_len,
> >> > > > +  const char *cmdline, size_t
> >> > > > extra_fdt_size)
> >> > > > +{
> >> > > > +   /* Did we boot using an initrd? */
> >> > > > +   prop = fdt_getprop(fdt, chosen_node, "linux,initrd-
> >> > > > start", NULL);
> >> > > > +   if (prop) {
> >> > > > +   u64 tmp_start, tmp_end, tmp_size;
> >> > > > +
> >> > > > +   tmp_start = fdt64_to_cpu(*((const fdt64_t *)
> >> > > > prop));
> >> > > > +
> >> > > > +   prop = fdt_getprop(fdt, chosen_node,
> >> > > > "linux,initrd-end", NULL);
> >> > > > +   if (!prop) {
> >> > > > +   ret = -EINVAL;
> >> > > > +   goto out;
> >> > > > +   }
> >> > > > +
> >> > > > +   tmp_end = fdt64_to_cpu(*((const fdt64_t *)
> >> > > > prop));
> >> > >
> >> > > Some kernel code assumes "linux,initrd-{start,end}" are 64-bit,
> >> > > other code assumes 32-bit.
> >> >
> >> > It can be either. The above code was a merge of arm64 and powerpc >> > 
> >> > both
> >> > of which use 64-bit and still only runs on those arches. It looks >> > 
> >> > like
> >> > some powerpc platforms may use 32-bit, but this would have been >> > 
> >> > broken
> >> > before.
>
> >> of_kexec_alloc_and_setup_fdt() is called from elf_64.c (in
> >> arch/powerpc/kexec) which is for 64-bit powerpc platform only.
> >
> > 64-bit PPC could be writing 32-bit property values. The architecture
> > size doesn't necessarily matter. And if the values came from the
> > bootloader, who knows what size it used.
> >
> > This code is 32-bit powerpc only?:
> >
> > arch/powerpc/boot/main.c-   /* Tell the kernel initrd address via 
> > device tree */
> > arch/powerpc/boot/main.c:   setprop_val(chosen, "linux,initrd-start", 
> > (u32)(initrd_addr));
> > arch/powerpc/boot/main.c-   setprop_val(chosen, "linux,initrd-end", 
> > (u32)(initrd_addr+initrd_size));
>
> Historically that code was always built 32-bit, even when used with a
> 64-bit kernel.
>
> These days it is also built 64-bit (for ppc64le).

How it is built is immaterial. It's always writing a 32-bit value due
to the u32 cast.

> It looks like the drivers/of/fdt.c code can handle either 64 or 32-bit,
> so I guess that's why it seems to be working.

Yes, that works, but that's not the issue. The question is does the
main.c code run in combination with kexec. The kexec code above
(copied straight from PPC code) would not work if linux,initrd-* are
written by the main.c code.

Rob

[PATCH] PS3: Remove unneeded semicolons

2021-06-16 Thread Wan Jiabing

Fix coccicheck warning:

./arch/powerpc/platforms/ps3/system-bus.c:607:2-3: Unneeded semicolon
./arch/powerpc/platforms/ps3/system-bus.c:766:2-3: Unneeded semicolon

Signed-off-by: Wan Jiabing 
---
 arch/powerpc/platforms/ps3/system-bus.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/ps3/system-bus.c 
b/arch/powerpc/platforms/ps3/system-bus.c
index 1a5665875165..f57f37fe038c 100644
--- a/arch/powerpc/platforms/ps3/system-bus.c
+++ b/arch/powerpc/platforms/ps3/system-bus.c
@@ -604,7 +604,7 @@ static dma_addr_t ps3_ioc0_map_page(struct device *_dev, 
struct page *page,
default:
/* not happned */
BUG();
-   };
+   }
result = ps3_dma_map(dev->d_region, (unsigned long)ptr, size,
 _addr, iopte_flag);
 
@@ -763,7 +763,7 @@ int ps3_system_bus_device_register(struct 
ps3_system_bus_device *dev)
break;
default:
BUG();
-   };
+   }
 
dev->core.of_node = NULL;
set_dev_node(>core, 0);
-- 
2.30.2

Re: [PATCH v2 0/6] mrermap fixes

2021-06-16 Thread Linus Torvalds

On Tue, Jun 15, 2021 at 9:53 PM Aneesh Kumar K.V
 wrote:
>
> This patch series is split out series from [PATCH v7 00/11] Speedup mremap on 
> ppc64
> (https://lore.kernel.org/linux-mm/20210607055131.156184-1-aneesh.ku...@linux.ibm.com)
> dropping ppc64 specific changes.

Both this and the followup powerpc enablement looks ok to me. Apart
from the obvious subject line bug ;)

Do we have robot confirmation that this version doesn't have any
performance regression?

  Linus

Re: [PATCH v2 1/1] powerpc/papr_scm: Properly handle UUID types and API

2021-06-16 Thread Andy Shevchenko

On Wed, Jun 16, 2021 at 07:17:03PM +0530, Aneesh Kumar K.V wrote:
> On 6/16/21 7:13 PM, Andy Shevchenko wrote:
> > Parse to and export from UUID own type, before dereferencing.
> > This also fixes wrong comment (Little Endian UUID is something else)
> > and should eliminate the direct strict types assignments.
> > 
> > Fixes: 43001c52b603 ("powerpc/papr_scm: Use ibm,unit-guid as the iset 
> > cookie")
> > Fixes: 259a948c4ba1 ("powerpc/pseries/scm: Use a specific endian format for 
> > storing uuid from the device tree")
> 
> Do we need the Fixes: there? It didn't change any functionality right? The
> format with which we stored cookie1 remains the same with older and newer
> code. The newer one is better?

Depends if you are okay with Sparse warnings.

> Reviewed-by: Aneesh Kumar K.V 

Thanks for review!

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH 00/21] Rid W=1 warnings from IDE

2021-06-16 Thread Christoph Hellwig

On Mon, Jun 14, 2021 at 10:12:28AM +0100, Lee Jones wrote:
> On Mon, 07 Jun 2021, Christoph Hellwig wrote:
> 
> > Please don't touch this code as it is about to be removed entirely.
> 
> Do you have an ETA for this work?

I just resent the series.

Re: [PATCH v4 1/2] module: add elf_check_module_arch for module specific elf arch checks

2021-06-16 Thread Jessica Yu


+++ Michael Ellerman [16/06/21 12:37 +1000]:

Jessica Yu  writes:

+++ Nicholas Piggin [15/06/21 12:05 +1000]:

Excerpts from Jessica Yu's message of June 14, 2021 10:06 pm:

+++ Nicholas Piggin [11/06/21 19:39 +1000]:

The elf_check_arch() function is used to test usermode binaries, but
kernel modules may have more specific requirements. powerpc would like
to test for ABI version compatibility.

Add an arch-overridable function elf_check_module_arch() that defaults
to elf_check_arch() and use it in elf_validity_check().

Signed-off-by: Michael Ellerman 
[np: split patch, added changelog]
Signed-off-by: Nicholas Piggin 
---
include/linux/moduleloader.h | 5 +
kernel/module.c  | 2 +-
2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
index 9e09d11ffe5b..fdc042a84562 100644
--- a/include/linux/moduleloader.h
+++ b/include/linux/moduleloader.h
@@ -13,6 +13,11 @@
 * must be implemented by each architecture.
 */

+// Allow arch to optionally do additional checking of module ELF header
+#ifndef elf_check_module_arch
+#define elf_check_module_arch elf_check_arch
+#endif


Hi Nicholas,

Why not make elf_check_module_arch() consistent with the other
arch-specific functions? Please see module_frob_arch_sections(),
module_{init,exit}_section(), etc in moduleloader.h. That is, they are
all __weak functions that are overridable by arches. We can maybe make
elf_check_module_arch() a weak symbol, available for arches to
override if they want to perform additional elf checks. Then we don't
have to have this one-off #define.



Like this? I like it. Good idea.


Yeah! Also, maybe we can alternatively make elf_check_module_arch() a
separate check entirely so that the powerpc implementation doesn't
have to include that extra elf_check_arch() call. Something like this maybe?


My thinking for making elf_check_module_arch() the only hook was that
conceivably you might not want/need to call elf_check_arch() from
elf_check_module_arch().

So having a single module specific hook allows arch code to decide
how to implement the check, which may or may not involve calling
elf_check_arch(), but that becomes an arch implementation detail.


Thanks for the feedback! Yeah, that's fair too. Well, I ended up doing
it this way mostly to create less churn/change of behavior, since in
its current state elf_check_arch() is already being called for each
arch. Additionally I wanted to save the powerpc implementation of
elf_check_module_arch() an extra elf_check_arch() call. In any case I
have a slight preference for having a second hook to allow arches add
any extra checks in addition to elf_check_arch(). Thanks!

Re: [PATCH v2 1/1] powerpc/papr_scm: Properly handle UUID types and API

2021-06-16 Thread Aneesh Kumar K.V


On 6/16/21 7:13 PM, Andy Shevchenko wrote:

Parse to and export from UUID own type, before dereferencing.
This also fixes wrong comment (Little Endian UUID is something else)
and should eliminate the direct strict types assignments.

Fixes: 43001c52b603 ("powerpc/papr_scm: Use ibm,unit-guid as the iset cookie")
Fixes: 259a948c4ba1 ("powerpc/pseries/scm: Use a specific endian format for storing 
uuid from the device tree")



Do we need the Fixes: there? It didn't change any functionality right? 
The format with which we stored cookie1 remains the same with older and 
newer code. The newer one is better?


Reviewed-by: Aneesh Kumar K.V 


Cc: Oliver O'Halloran 
Cc: Aneesh Kumar K.V 
Signed-off-by: Andy Shevchenko 
---
v2: added missed header (Vaibhav), updated comment (Aneesh),
 rewrite part of the commit message to avoid mentioning the Sparse
  arch/powerpc/platforms/pseries/papr_scm.c | 27 +++
  1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index e2b69cc3beaf..b43be41e8ff7 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -18,6 +18,7 @@
  #include 
  #include 
  #include 
+#include 
  
  #define BIND_ANY_ADDR (~0ul)
  
@@ -1101,8 +1102,9 @@ static int papr_scm_probe(struct platform_device *pdev)

u32 drc_index, metadata_size;
u64 blocks, block_size;
struct papr_scm_priv *p;
+   u8 uuid_raw[UUID_SIZE];
const char *uuid_str;
-   u64 uuid[2];
+   uuid_t uuid;
int rc;
  
  	/* check we have all the required DT properties */

@@ -1145,16 +1147,23 @@ static int papr_scm_probe(struct platform_device *pdev)
p->hcall_flush_required = of_property_read_bool(dn, 
"ibm,hcall-flush-required");
  
  	/* We just need to ensure that set cookies are unique across */

-   uuid_parse(uuid_str, (uuid_t *) uuid);
+   uuid_parse(uuid_str, );
+
/*
-* cookie1 and cookie2 are not really little endian
-* we store a little endian representation of the
-* uuid str so that we can compare this with the label
-* area cookie irrespective of the endian config with which
-* the kernel is built.
+* The cookie1 and cookie2 are not really little endian.
+* We store a raw buffer representation of the
+* uuid string so that we can compare this with the label
+* area cookie irrespective of the endian configuration
+* with which the kernel is built.
+*
+* Historically we stored the cookie in the below format.
+* for a uuid string 72511b67-0b3b-42fd-8d1d-5be3cae8bcaa
+*  cookie1 was 0xfd423b0b671b5172
+*  cookie2 was 0xaabce8cae35b1d8d
 */
-   p->nd_set.cookie1 = cpu_to_le64(uuid[0]);
-   p->nd_set.cookie2 = cpu_to_le64(uuid[1]);
+   export_uuid(uuid_raw, );
+   p->nd_set.cookie1 = get_unaligned_le64(_raw[0]);
+   p->nd_set.cookie2 = get_unaligned_le64(_raw[8]);
  
  	/* might be zero */

p->metadata_size = metadata_size;

[PATCH 1/1] powerpc/pseries: Interface to represent PAPR firmware attributes

2021-06-16 Thread Pratik R. Sampat

Adds a generic interface to represent the energy and frequency related
PAPR attributes on the system using the new H_CALL
"H_GET_ENERGY_SCALE_INFO".

H_GET_EM_PARMS H_CALL was previously responsible for exporting this
information in the lparcfg, however the H_GET_EM_PARMS H_CALL
will be deprecated P10 onwards.

The H_GET_ENERGY_SCALE_INFO H_CALL is of the following call format:
hcall(
  uint64 H_GET_ENERGY_SCALE_INFO,  // Get energy scale info
  uint64 flags,   // Per the flag request
  uint64 firstAttributeId,// The attribute id
  uint64 bufferAddress,   // Guest physical address of the output buffer
  uint64 bufferSize   // The size in bytes of the output buffer
);

This H_CALL can query either all the attributes at once with
firstAttributeId = 0, flags = 0 as well as query only one attribute
at a time with firstAttributeId = id

The output buffer consists of the following
1. number of attributes  - 8 bytes
2. array offset to the data location - 8 bytes
3. version info  - 1 byte
4. A data array of size num attributes, which contains the following:
  a. attribute ID  - 8 bytes
  b. attribute value in number - 8 bytes
  c. attribute name in string  - 64 bytes
  d. attribute value in string - 64 bytes

The new H_CALL exports information in direct string value format, hence
a new interface has been introduced in
/sys/firmware/papr/energy_scale_info to export this information to
userspace in an extensible pass-through format.

The H_CALL returns the name, numeric value and string value (if exists)

The format of exposing the sysfs information is as follows:
/sys/firmware/papr/energy_scale_info/
   |-- /
 |-- desc
 |-- value
 |-- value_desc (if exists)
   |-- /
 |-- desc
 |-- value
 |-- value_desc (if exists)
...

The energy information that is exported is useful for userspace tools
such as powerpc-utils. Currently these tools infer the
"power_mode_data" value in the lparcfg, which in turn is obtained from
the to be deprecated H_GET_EM_PARMS H_CALL.
On future platforms, such userspace utilities will have to look at the
data returned from the new H_CALL being populated in this new sysfs
interface and report this information directly without the need of
interpretation.

Signed-off-by: Pratik R. Sampat 
---
 .../sysfs-firmware-papr-energy-scale-info |  26 ++
 arch/powerpc/include/asm/hvcall.h |  21 +-
 arch/powerpc/kvm/trace_hv.h   |   1 +
 arch/powerpc/platforms/pseries/Makefile   |   3 +-
 .../pseries/papr_platform_attributes.c| 292 ++
 5 files changed, 341 insertions(+), 2 deletions(-)
 create mode 100644 
Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
 create mode 100644 arch/powerpc/platforms/pseries/papr_platform_attributes.c

diff --git a/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info 
b/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
new file mode 100644
index ..499bc1ae173a
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
@@ -0,0 +1,26 @@
+What:  /sys/firmware/papr/energy_scale_info
+Date:  June 2021
+Contact:   Linux for PowerPC mailing list 
+Description:   Director hosting a set of platform attributes on Linux
+   running as a PAPR guest.
+
+   Each file in a directory contains a platform
+   attribute hierarchy pertaining to performance/
+   energy-savings mode and processor frequency.
+
+What:  /sys/firmware/papr/energy_scale_info/
+   /sys/firmware/papr/energy_scale_info//desc
+   /sys/firmware/papr/energy_scale_info//value
+   /sys/firmware/papr/energy_scale_info//value_desc
+Date:  June 2021
+Contact:   Linux for PowerPC mailing list 
+Description:   PAPR attributes directory for POWERVM servers
+
+   This directory provides PAPR information. It
+   contains below sysfs attributes:
+
+   - desc: File contains the name of attribute 
+
+   - value: Numeric value of attribute 
+
+   - value_desc: String value of attribute 
diff --git a/arch/powerpc/include/asm/hvcall.h 
b/arch/powerpc/include/asm/hvcall.h
index e3b29eda8074..19a2a8c77a49 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -316,7 +316,8 @@
 #define H_SCM_PERFORMANCE_STATS 0x418
 #define H_RPT_INVALIDATE   0x448
 #define H_SCM_FLUSH0x44C
-#define MAX_HCALL_OPCODE   H_SCM_FLUSH
+#define H_GET_ENERGY_SCALE_INFO0x450
+#define MAX_HCALL_OPCODE   H_GET_ENERGY_SCALE_INFO
 
 /* Scope args for H_SCM_UNBIND_ALL */
 #define H_UNBIND_SCOPE_ALL (0x1)
@@ -631,6 +632,24 @@ struct hv_gpci_request_buffer {
uint8_t bytes[HGPCI_MAX_DATA_BYTES];
 } __packed;
 
+#define MAX_EM_ATTRS   10
+#define MAX_EM_DATA_BYTES \
+   (sizeof(struct energy_scale_attributes)

[PATCH 0/1] Interface to represent PAPR firmware attributes

2021-06-16 Thread Pratik R. Sampat

RFC --> v1
RFC: https://lkml.org/lkml/2021/6/4/791

Changelog:
Overhaul in interface design to the following:
/sys/firmware/papr/energy_scale_info/
   |-- /
 |-- desc
 |-- value
 |-- value_desc (if exists)
   |-- /
 |-- desc
 |-- value
 |-- value_desc (if exists)

Also implemented a POC using this interface for the powerpc-utils'
ppc64_cpu --frequency command-line tool to utilize this information
in userspace.
The POC for the new interface has been hosted here:
https://github.com/pratiksampat/powerpc-utils/tree/H_GET_ENERGY_SCALE_INFO_v2

Sample output from the powerpc-utils tool is as follows:

# ppc64_cpu --frequency
Power and Performance Mode: 
Idle Power Saver Status   : 
Processor Folding Status  :  --> Printed if Idle power save status is 
supported

Platform reported frequencies --> Frequencies reported from the platform's 
H_CALL i.e PAPR interface
min: GHz
max: GHz
static : GHz

Tool Computed frequencies
min: GHz (cpu XX)
max: GHz (cpu XX)
avg: GHz

Pratik R. Sampat (1):
  powerpc/pseries: Interface to represent PAPR firmware attributes

 .../sysfs-firmware-papr-energy-scale-info |  26 ++
 arch/powerpc/include/asm/hvcall.h |  21 +-
 arch/powerpc/kvm/trace_hv.h   |   1 +
 arch/powerpc/platforms/pseries/Makefile   |   3 +-
 .../pseries/papr_platform_attributes.c| 292 ++
 5 files changed, 341 insertions(+), 2 deletions(-)
 create mode 100644 
Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
 create mode 100644 arch/powerpc/platforms/pseries/papr_platform_attributes.c

-- 
2.30.2

[PATCH v2 1/1] powerpc/papr_scm: Properly handle UUID types and API

2021-06-16 Thread Andy Shevchenko

Parse to and export from UUID own type, before dereferencing.
This also fixes wrong comment (Little Endian UUID is something else)
and should eliminate the direct strict types assignments.

Fixes: 43001c52b603 ("powerpc/papr_scm: Use ibm,unit-guid as the iset cookie")
Fixes: 259a948c4ba1 ("powerpc/pseries/scm: Use a specific endian format for 
storing uuid from the device tree")
Cc: Oliver O'Halloran 
Cc: Aneesh Kumar K.V 
Signed-off-by: Andy Shevchenko 
---
v2: added missed header (Vaibhav), updated comment (Aneesh),
rewrite part of the commit message to avoid mentioning the Sparse
 arch/powerpc/platforms/pseries/papr_scm.c | 27 +++
 1 file changed, 18 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index e2b69cc3beaf..b43be41e8ff7 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define BIND_ANY_ADDR (~0ul)
 
@@ -1101,8 +1102,9 @@ static int papr_scm_probe(struct platform_device *pdev)
u32 drc_index, metadata_size;
u64 blocks, block_size;
struct papr_scm_priv *p;
+   u8 uuid_raw[UUID_SIZE];
const char *uuid_str;
-   u64 uuid[2];
+   uuid_t uuid;
int rc;
 
/* check we have all the required DT properties */
@@ -1145,16 +1147,23 @@ static int papr_scm_probe(struct platform_device *pdev)
p->hcall_flush_required = of_property_read_bool(dn, 
"ibm,hcall-flush-required");
 
/* We just need to ensure that set cookies are unique across */
-   uuid_parse(uuid_str, (uuid_t *) uuid);
+   uuid_parse(uuid_str, );
+
/*
-* cookie1 and cookie2 are not really little endian
-* we store a little endian representation of the
-* uuid str so that we can compare this with the label
-* area cookie irrespective of the endian config with which
-* the kernel is built.
+* The cookie1 and cookie2 are not really little endian.
+* We store a raw buffer representation of the
+* uuid string so that we can compare this with the label
+* area cookie irrespective of the endian configuration
+* with which the kernel is built.
+*
+* Historically we stored the cookie in the below format.
+* for a uuid string 72511b67-0b3b-42fd-8d1d-5be3cae8bcaa
+*  cookie1 was 0xfd423b0b671b5172
+*  cookie2 was 0xaabce8cae35b1d8d
 */
-   p->nd_set.cookie1 = cpu_to_le64(uuid[0]);
-   p->nd_set.cookie2 = cpu_to_le64(uuid[1]);
+   export_uuid(uuid_raw, );
+   p->nd_set.cookie1 = get_unaligned_le64(_raw[0]);
+   p->nd_set.cookie2 = get_unaligned_le64(_raw[8]);
 
/* might be zero */
p->metadata_size = metadata_size;
-- 
2.30.2

Re: [PATCH v1 1/1] powerpc/papr_scm: Properly handle UUID types and API

2021-06-16 Thread Andy Shevchenko

On Fri, Apr 16, 2021 at 03:05:31PM +0530, Aneesh Kumar K.V wrote:
> On 4/16/21 2:39 PM, Andy Shevchenko wrote:
> > On Fri, Apr 16, 2021 at 01:28:21PM +0530, Aneesh Kumar K.V wrote:
> > > On 4/15/21 7:16 PM, Andy Shevchenko wrote:
> > > > Parse to and export from UUID own type, before dereferencing.
> > > > This also fixes wrong comment (Little Endian UUID is something else)
> > > > and should fix Sparse warnings about assigning strict types to POD.
> > > > 
> > > > Fixes: 43001c52b603 ("powerpc/papr_scm: Use ibm,unit-guid as the iset 
> > > > cookie")
> > > > Fixes: 259a948c4ba1 ("powerpc/pseries/scm: Use a specific endian format 
> > > > for storing uuid from the device tree")
> > > > Cc: Oliver O'Halloran 
> > > > Cc: Aneesh Kumar K.V 
> > > > Signed-off-by: Andy Shevchenko 
> > > > ---
> > > > Not tested
> > > >arch/powerpc/platforms/pseries/papr_scm.c | 13 -
> > > >1 file changed, 8 insertions(+), 5 deletions(-)
> > > > 
> > > > diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
> > > > b/arch/powerpc/platforms/pseries/papr_scm.c
> > > > index ae6f5d80d5ce..4366e1902890 100644
> > > > --- a/arch/powerpc/platforms/pseries/papr_scm.c
> > > > +++ b/arch/powerpc/platforms/pseries/papr_scm.c
> > > > @@ -1085,8 +1085,9 @@ static int papr_scm_probe(struct platform_device 
> > > > *pdev)
> > > > u32 drc_index, metadata_size;
> > > > u64 blocks, block_size;
> > > > struct papr_scm_priv *p;
> > > > +   u8 uuid_raw[UUID_SIZE];
> > > > const char *uuid_str;
> > > > -   u64 uuid[2];
> > > > +   uuid_t uuid;
> > > > int rc;
> > > > /* check we have all the required DT properties */
> > > > @@ -1129,16 +1130,18 @@ static int papr_scm_probe(struct 
> > > > platform_device *pdev)
> > > > p->hcall_flush_required = of_property_read_bool(dn, 
> > > > "ibm,hcall-flush-required");
> > > > /* We just need to ensure that set cookies are unique across */
> > > > -   uuid_parse(uuid_str, (uuid_t *) uuid);
> > > > +   uuid_parse(uuid_str, );
> > > > +
> > > > /*
> > > >  * cookie1 and cookie2 are not really little endian
> > > > -* we store a little endian representation of the
> > > > +* we store a raw buffer representation of the
> > > >  * uuid str so that we can compare this with the label
> > > >  * area cookie irrespective of the endian config with which
> > > >  * the kernel is built.
> > > >  */
> > > > -   p->nd_set.cookie1 = cpu_to_le64(uuid[0]);
> > > > -   p->nd_set.cookie2 = cpu_to_le64(uuid[1]);
> > > > +   export_uuid(uuid_raw, );
> > > > +   p->nd_set.cookie1 = get_unaligned_le64(_raw[0]);
> > > > +   p->nd_set.cookie2 = get_unaligned_le64(_raw[8]);
> > > 
> > > ok that does the equivalent of cpu_to_le64 there. So we are good. But the
> > > comment update is missing the details why we did that get_unaligned_le64.
> > > Maybe raw buffer representation is the correct term?
> > > Should we add an example in the comment. ie,
> > 
> > > /*
> > >   * Historically we stored the cookie in the below format.
> > > for a uuid str 72511b67-0b3b-42fd-8d1d-5be3cae8bcaa
> > > cookie1 was  0xfd423b0b671b5172 cookie2 was 0xaabce8cae35b1d8d
> > > */
> > 
> > I'm fine with the comment. At least it will shed a light on the byte 
> > ordering
> > we are expecting.
> > 
> 
> Will you be sending an update? Also it will be good to list the sparse
> warning in the commit message?

I'll send an update but I rephrase to remove mention of Sparse. I have no
Sparse build for this architecture.

If you have one, try to build with `make W=1 C=1 CF=-D__CHECK_ENDIAN__ ...`
which will enable warnings about restricted types assignment.

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH 07/11] powerpc: Add support for microwatt's hardware random number generator

2021-06-16 Thread Michael Ellerman

Nicholas Piggin  writes:
> Excerpts from Paul Mackerras's message of June 15, 2021 9:02 am:
>> This is accessed using the DARN instruction and should probably be
>> done more generically.
>> 
>> Signed-off-by: Paul Mackerras 
>> ---
>>  arch/powerpc/include/asm/archrandom.h | 12 +-
>>  arch/powerpc/platforms/microwatt/Kconfig  |  1 +
>>  arch/powerpc/platforms/microwatt/Makefile |  2 +-
>>  arch/powerpc/platforms/microwatt/rng.c| 48 +++
>>  4 files changed, 61 insertions(+), 2 deletions(-)
>>  create mode 100644 arch/powerpc/platforms/microwatt/rng.c
>> 
>> diff --git a/arch/powerpc/include/asm/archrandom.h 
>> b/arch/powerpc/include/asm/archrandom.h
>> index 9a53e29680f4..e8ae0f7740f9 100644
>> --- a/arch/powerpc/include/asm/archrandom.h
>> +++ b/arch/powerpc/include/asm/archrandom.h
>> @@ -8,12 +8,22 @@
>>  
>>  static inline bool __must_check arch_get_random_long(unsigned long *v)
>>  {
>> +if (ppc_md.get_random_seed)
>> +return ppc_md.get_random_seed(v);
>> +
>>  return false;
>>  }
>>  
>>  static inline bool __must_check arch_get_random_int(unsigned int *v)
>>  {
>> -return false;
>> +unsigned long val;
>> +bool rc;
>> +
>> +rc = arch_get_random_long();
>> +if (rc)
>> +*v = val;
>> +
>> +return rc;
>>  }
>>  
>
> I would be happier if you didn't change this (or at least put it in its 
> own patch explaining why it's not going to slow down other platforms).

It would essentially be a revert of 01c9348c7620 ("powerpc: Use hardware
RNG for arch_get_random_seed_* not arch_get_random_*")

Which would be ironic :)

cheers

Re: [PATCH v13 1/3] kasan: allow an architecture to disable inline instrumentation

2021-06-16 Thread Marco Elver

On Wed, 16 Jun 2021 at 10:02, Daniel Axtens  wrote:
>
> For annoying architectural reasons, it's very difficult to support inline
> instrumentation on powerpc64.*
>
> Add a Kconfig flag to allow an arch to disable inline. (It's a bit
> annoying to be 'backwards', but I'm not aware of any way to have
> an arch force a symbol to be 'n', rather than 'y'.)
>
> We also disable stack instrumentation in this case as it does things that
> are functionally equivalent to inline instrumentation, namely adding
> code that touches the shadow directly without going through a C helper.
>
> * on ppc64 atm, the shadow lives in virtual memory and isn't accessible in
> real mode. However, before we turn on virtual memory, we parse the device
> tree to determine which platform and MMU we're running under. That calls
> generic DT code, which is instrumented. Inline instrumentation in DT would
> unconditionally attempt to touch the shadow region, which we won't have
> set up yet, and would crash. We can make outline mode wait for the arch to
> be ready, but we can't change what the compiler inserts for inline mode.
>
> Signed-off-by: Daniel Axtens 

Reviewed-by: Marco Elver 

Thank you.

> ---
>  lib/Kconfig.kasan | 14 ++
>  1 file changed, 14 insertions(+)
>
> diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
> index cffc2ebbf185..cb5e02d09e11 100644
> --- a/lib/Kconfig.kasan
> +++ b/lib/Kconfig.kasan
> @@ -12,6 +12,15 @@ config HAVE_ARCH_KASAN_HW_TAGS
>  config HAVE_ARCH_KASAN_VMALLOC
> bool
>
> +config ARCH_DISABLE_KASAN_INLINE
> +   bool
> +   help
> + Sometimes an architecture might not be able to support inline
> + instrumentation but might be able to support outline 
> instrumentation.
> + This option allows an architecture to prevent inline and stack
> + instrumentation from being enabled.
> +
> +
>  config CC_HAS_KASAN_GENERIC
> def_bool $(cc-option, -fsanitize=kernel-address)
>
> @@ -130,6 +139,7 @@ config KASAN_OUTLINE
>
>  config KASAN_INLINE
> bool "Inline instrumentation"
> +   depends on !ARCH_DISABLE_KASAN_INLINE
> help
>   Compiler directly inserts code checking shadow memory before
>   memory accesses. This is faster than outline (in some workloads
> @@ -141,6 +151,7 @@ endchoice
>  config KASAN_STACK
> bool "Enable stack instrumentation (unsafe)" if CC_IS_CLANG && 
> !COMPILE_TEST
> depends on KASAN_GENERIC || KASAN_SW_TAGS
> +   depends on !ARCH_DISABLE_KASAN_INLINE
> default y if CC_IS_GCC
> help
>   The LLVM stack address sanitizer has a know problem that
> @@ -154,6 +165,9 @@ config KASAN_STACK
>   but clang users can still enable it for builds without
>   CONFIG_COMPILE_TEST.  On gcc it is assumed to always be safe
>   to use and enabled by default.
> + If the architecture disables inline instrumentation, this is
> + also disabled as it adds inline-style instrumentation that
> + is run unconditionally.
>
>  config KASAN_SW_TAGS_IDENTIFY
> bool "Enable memory corruption identification"
> --
> 2.30.2
>
> --
> You received this message because you are subscribed to the Google Groups 
> "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to kasan-dev+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/kasan-dev/20210616080244.51236-2-dja%40axtens.net.

Re: [PATCH v13 3/3] kasan: define and use MAX_PTRS_PER_* for early shadow tables

2021-06-16 Thread Marco Elver

On Wed, 16 Jun 2021 at 10:03, Daniel Axtens  wrote:
[...]
> diff --git a/include/linux/kasan.h b/include/linux/kasan.h
> index 768d7d342757..fd65f477ac92 100644
> --- a/include/linux/kasan.h
> +++ b/include/linux/kasan.h
> @@ -40,10 +40,22 @@ struct kunit_kasan_expectation {
>  #define PTE_HWTABLE_PTRS 0
>  #endif
>
> +#ifndef MAX_PTRS_PER_PTE
> +#define MAX_PTRS_PER_PTE PTRS_PER_PTE
> +#endif
> +
> +#ifndef MAX_PTRS_PER_PMD
> +#define MAX_PTRS_PER_PMD PTRS_PER_PMD
> +#endif
> +
> +#ifndef MAX_PTRS_PER_PUD
> +#define MAX_PTRS_PER_PUD PTRS_PER_PUD
> +#endif

This is introducing new global constants in a  header. It
feels like this should be in  together with a
comment. Because  is actually included in
, most of the kernel will get these new definitions.
That in itself is fine, but it feels wrong that the KASAN header
introduces these.

Thoughts?

Sorry for only realizing this now.

Thanks,
-- Marco

>  extern unsigned char kasan_early_shadow_page[PAGE_SIZE];
> -extern pte_t kasan_early_shadow_pte[PTRS_PER_PTE + PTE_HWTABLE_PTRS];
> -extern pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD];
> -extern pud_t kasan_early_shadow_pud[PTRS_PER_PUD];
> +extern pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE + PTE_HWTABLE_PTRS];
> +extern pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD];
> +extern pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD];
>  extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D];
>
>  int kasan_populate_early_shadow(const void *shadow_start,
> diff --git a/mm/kasan/init.c b/mm/kasan/init.c
> index 348f31d15a97..cc64ed6858c6 100644
> --- a/mm/kasan/init.c
> +++ b/mm/kasan/init.c
> @@ -41,7 +41,7 @@ static inline bool kasan_p4d_table(pgd_t pgd)
>  }
>  #endif
>  #if CONFIG_PGTABLE_LEVELS > 3
> -pud_t kasan_early_shadow_pud[PTRS_PER_PUD] __page_aligned_bss;
> +pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD] __page_aligned_bss;
>  static inline bool kasan_pud_table(p4d_t p4d)
>  {
> return p4d_page(p4d) == 
> virt_to_page(lm_alias(kasan_early_shadow_pud));
> @@ -53,7 +53,7 @@ static inline bool kasan_pud_table(p4d_t p4d)
>  }
>  #endif
>  #if CONFIG_PGTABLE_LEVELS > 2
> -pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD] __page_aligned_bss;
> +pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD] __page_aligned_bss;
>  static inline bool kasan_pmd_table(pud_t pud)
>  {
> return pud_page(pud) == 
> virt_to_page(lm_alias(kasan_early_shadow_pmd));
> @@ -64,7 +64,7 @@ static inline bool kasan_pmd_table(pud_t pud)
> return false;
>  }
>  #endif
> -pte_t kasan_early_shadow_pte[PTRS_PER_PTE + PTE_HWTABLE_PTRS]
> +pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE + PTE_HWTABLE_PTRS]
> __page_aligned_bss;
>
>  static inline bool kasan_pte_table(pmd_t pmd)
> --
> 2.30.2
>
> --
> You received this message because you are subscribed to the Google Groups 
> "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to kasan-dev+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/kasan-dev/20210616080244.51236-4-dja%40axtens.net.

Re: [PATCH v2 00/12] powerpc: Cleanup use of 'struct ppc_inst'

2021-06-16 Thread Michael Ellerman

Christophe Leroy  writes:
> Le 15/06/2021 à 09:18, Michael Ellerman a écrit :
>> Christophe Leroy  writes:
>>> This series is a cleanup of the use of 'struct ppc_inst'.
>>>
>>> A confusion is made between internal representation of powerpc
>>> instructions with 'struct ppc_inst' and in-memory code which is
>>> and will always be an array of 'unsigned int'.
>> 
>> Why don't we use u32 *, to make it even more explicit what the expected
>> size is?
>> 
>
> I guess that's historical, we could use u32 *

Yeah I think it is historical, we just never thought about it much.

> We can convert it incrementaly maybe ?

I've still got this series in next-test, so I'll go through it and
change any uses of unsigned int * to u32 *, and then we can do another
pass later to change the remaining cases.

cheers

Re: [PATCH v13 2/3] kasan: allow architectures to provide an outline readiness check

2021-06-16 Thread Marco Elver

On Wed, 16 Jun 2021 at 10:02, Daniel Axtens  wrote:
> Allow architectures to define a kasan_arch_is_ready() hook that bails
> out of any function that's about to touch the shadow unless the arch
> says that it is ready for the memory to be accessed. This is fairly
> uninvasive and should have a negligible performance penalty.
>
> This will only work in outline mode, so an arch must specify
> ARCH_DISABLE_KASAN_INLINE if it requires this.
>
> Cc: Balbir Singh 
> Cc: Aneesh Kumar K.V 
> Suggested-by: Christophe Leroy 
> Signed-off-by: Daniel Axtens 

Reviewed-by: Marco Elver 

but also check if an assertion that this is only used with
KASAN_GENERIC might make sense (below). Depends on how much we want to
make sure kasan_arch_is_ready() could be useful for other modes (which
I don't think it makes sense).

> --
>
> I discuss the justfication for this later in the series. Also,
> both previous RFCs for ppc64 - by 2 different people - have
> needed this trick! See:
>  - https://lore.kernel.org/patchwork/patch/592820/ # ppc64 hash series
>  - https://patchwork.ozlabs.org/patch/795211/  # ppc radix series
> ---
>  mm/kasan/common.c  | 4 
>  mm/kasan/generic.c | 3 +++
>  mm/kasan/kasan.h   | 4 
>  mm/kasan/shadow.c  | 8 
>  4 files changed, 19 insertions(+)
>
> diff --git a/mm/kasan/common.c b/mm/kasan/common.c
> index 10177cc26d06..0ad615f3801d 100644
> --- a/mm/kasan/common.c
> +++ b/mm/kasan/common.c
> @@ -331,6 +331,10 @@ static inline bool kasan_slab_free(struct kmem_cache 
> *cache, void *object,
> u8 tag;
> void *tagged_object;
>
> +   /* Bail if the arch isn't ready */
> +   if (!kasan_arch_is_ready())
> +   return false;
> +
> tag = get_tag(object);
> tagged_object = object;
> object = kasan_reset_tag(object);
> diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
> index 53cbf28859b5..c3f5ba7a294a 100644
> --- a/mm/kasan/generic.c
> +++ b/mm/kasan/generic.c
> @@ -163,6 +163,9 @@ static __always_inline bool check_region_inline(unsigned 
> long addr,
> size_t size, bool write,
> unsigned long ret_ip)
>  {
> +   if (!kasan_arch_is_ready())
> +   return true;
> +
> if (unlikely(size == 0))
> return true;
>
> diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
> index 8f450bc28045..19323a3d5975 100644
> --- a/mm/kasan/kasan.h
> +++ b/mm/kasan/kasan.h
> @@ -449,6 +449,10 @@ static inline void kasan_poison_last_granule(const void 
> *address, size_t size) {
>
>  #endif /* CONFIG_KASAN_GENERIC */
>
> +#ifndef kasan_arch_is_ready
> +static inline bool kasan_arch_is_ready(void)   { return true; }
> +#endif
> +

I've been trying to think of a way to make it clear this is only for
KASAN_GENERIC mode, and not the others. An arch can always define this
function, but of course it might not be used. One way would be to add
an '#ifndef CONFIG_KASAN_GENERIC' in the #else case and #error if it's
not generic mode.

I think trying to make this do anything useful for SW_TAGS or HW_TAGS
modes does not make sense (at least right now).

>  /*
>   * Exported functions for interfaces called from assembly or from generated
>   * code. Declarations here to avoid warning about missing declarations.
> diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
> index 082ee5b6d9a1..3c7f7efe6f68 100644
> --- a/mm/kasan/shadow.c
> +++ b/mm/kasan/shadow.c
> @@ -73,6 +73,10 @@ void kasan_poison(const void *addr, size_t size, u8 value, 
> bool init)
>  {
> void *shadow_start, *shadow_end;
>
> +   /* Don't touch the shadow memory if arch isn't ready */
> +   if (!kasan_arch_is_ready())
> +   return;
> +
> /*
>  * Perform shadow offset calculation based on untagged address, as
>  * some of the callers (e.g. kasan_poison_object_data) pass tagged
> @@ -99,6 +103,10 @@ EXPORT_SYMBOL(kasan_poison);
>  #ifdef CONFIG_KASAN_GENERIC
>  void kasan_poison_last_granule(const void *addr, size_t size)
>  {
> +   /* Don't touch the shadow memory if arch isn't ready */
> +   if (!kasan_arch_is_ready())
> +   return;
> +
> if (size & KASAN_GRANULE_MASK) {
> u8 *shadow = (u8 *)kasan_mem_to_shadow(addr + size);
> *shadow = size & KASAN_GRANULE_MASK;
> --
> 2.30.2
>
> --
> You received this message because you are subscribed to the Google Groups 
> "kasan-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to kasan-dev+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/kasan-dev/20210616080244.51236-3-dja%40axtens.net.

Re: [PATCH v4 1/2] module: add elf_check_module_arch for module specific elf arch checks

2021-06-16 Thread Jessica Yu


+++ Nicholas Piggin [16/06/21 11:18 +1000]:

Excerpts from Jessica Yu's message of June 15, 2021 10:17 pm:

+++ Nicholas Piggin [15/06/21 12:05 +1000]:

Excerpts from Jessica Yu's message of June 14, 2021 10:06 pm:

+++ Nicholas Piggin [11/06/21 19:39 +1000]:

The elf_check_arch() function is used to test usermode binaries, but
kernel modules may have more specific requirements. powerpc would like
to test for ABI version compatibility.

Add an arch-overridable function elf_check_module_arch() that defaults
to elf_check_arch() and use it in elf_validity_check().

Signed-off-by: Michael Ellerman 
[np: split patch, added changelog]
Signed-off-by: Nicholas Piggin 
---
include/linux/moduleloader.h | 5 +
kernel/module.c  | 2 +-
2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
index 9e09d11ffe5b..fdc042a84562 100644
--- a/include/linux/moduleloader.h
+++ b/include/linux/moduleloader.h
@@ -13,6 +13,11 @@
 * must be implemented by each architecture.
 */

+// Allow arch to optionally do additional checking of module ELF header
+#ifndef elf_check_module_arch
+#define elf_check_module_arch elf_check_arch
+#endif


Hi Nicholas,

Why not make elf_check_module_arch() consistent with the other
arch-specific functions? Please see module_frob_arch_sections(),
module_{init,exit}_section(), etc in moduleloader.h. That is, they are
all __weak functions that are overridable by arches. We can maybe make
elf_check_module_arch() a weak symbol, available for arches to
override if they want to perform additional elf checks. Then we don't
have to have this one-off #define.



Like this? I like it. Good idea.


Yeah! Also, maybe we can alternatively make elf_check_module_arch() a
separate check entirely so that the powerpc implementation doesn't
have to include that extra elf_check_arch() call. Something like this maybe?


Yeah we can do that. Would you be okay if it goes via powerpc tree? If
yes, then we should get your Ack (or SOB because it seems to be entirely
your patch now :D)


This can go through the powerpc tree. Will you do another respin
of this patch? And yes, feel free to take my SOB for this one -

Signed-off-by: Jessica Yu 

Thanks!

Jessica

Re: [PATCH v12 00/12] Restricted DMA

2021-06-16 Thread Will Deacon

Hi Claire,

On Wed, Jun 16, 2021 at 02:21:45PM +0800, Claire Chang wrote:
> This series implements mitigations for lack of DMA access control on
> systems without an IOMMU, which could result in the DMA accessing the
> system memory at unexpected times and/or unexpected addresses, possibly
> leading to data leakage or corruption.
> 
> For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
> not behind an IOMMU. As PCI-e, by design, gives the device full access to
> system memory, a vulnerability in the Wi-Fi firmware could easily escalate
> to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
> full chain of exploits; [2], [3]).
> 
> To mitigate the security concerns, we introduce restricted DMA. Restricted
> DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a
> specially allocated region and does memory allocation from the same region.
> The feature on its own provides a basic level of protection against the DMA
> overwriting buffer contents at unexpected times. However, to protect
> against general data leakage and system memory corruption, the system needs
> to provide a way to restrict the DMA to a predefined memory region (this is
> usually done at firmware level, e.g. MPU in ATF on some ARM platforms [4]).
> 
> [1a] 
> https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html
> [1b] 
> https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html
> [2] https://blade.tencent.com/en/advisories/qualpwn/
> [3] 
> https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/
> [4] 
> https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132
> 
> v12:
> Split is_dev_swiotlb_force into is_swiotlb_force_bounce (patch 06/12) and
> is_swiotlb_for_alloc (patch 09/12)

I took this for a spin in an arm64 KVM guest with virtio devices using the
DMA API and it works as expected on top of swiotlb devel/for-linus-5.14, so:

Tested-by: Will Deacon 

Thanks!

Will

Re: [PATCH v2 0/4] Add perf interface to expose nvdimm

2021-06-16 Thread Nageswara Sastry




> On 14-Jun-2021, at 10:53 AM, Kajol Jain  wrote:
> 
> Patchset adds performance stats reporting support for nvdimm.
> Added interface includes support for pmu register/unregister
> functions. A structure is added called nvdimm_pmu to be used for
> adding arch/platform specific data such as supported events, cpumask
> pmu event functions like event_init/add/read/del.
> User could use the standard perf tool to access perf
> events exposed via pmu.
> 
> Added implementation to expose IBM pseries platform nmem*
> device performance stats using this interface.
> ...
> 
> Patch1:
>Introduces the nvdimm_pmu structure
> Patch2:
>   Adds common interface to add arch/platform specific data
>   includes supported events, pmu event functions. It also
>   adds code for cpu hotplug support.
> Patch3:
>Add code in arch/powerpc/platform/pseries/papr_scm.c to expose
>nmem* pmu. It fills in the nvdimm_pmu structure with event attrs
>cpumask andevent functions and then registers the pmu by adding
>callbacks to register_nvdimm_pmu.
> Patch4:
>Sysfs documentation patch

Tested with the following scenarios:
1. Check dmesg for nmem PMU registered messages.
2. Listed nmem events using 'perf list and perf list nmem'
3. Ran 'perf stat' with single event, grouping events, events from same pmu,
   different pmu and invalid events
4. Read from sysfs files, Writing in to sysfs files
5. While running nmem events with perf stat, offline cpu from the nmem?/cpumask

While running the above functionality worked as expected, no error messages seen
in dmesg.

Tested-by: Nageswara R Sastry 

> 
> Changelog
> ---
> PATCH v1 -> PATCH v2
> - Fix hotplug code by adding pmu migration call
>  incase current designated cpu got offline. As
>  pointed by Peter Zijlstra.
> 
> - Removed the retun -1 part from cpu hotplug offline
>  function.
> 
> - Link to the previous patchset : https://lkml.org/lkml/2021/6/8/500
> ---
> Kajol Jain (4):
>  drivers/nvdimm: Add nvdimm pmu structure
>  drivers/nvdimm: Add perf interface to expose nvdimm performance stats
>  powerpc/papr_scm: Add perf interface support
>  powerpc/papr_scm: Document papr_scm sysfs event format entries
> 
> Documentation/ABI/testing/sysfs-bus-papr-pmem |  31 ++
> arch/powerpc/include/asm/device.h |   5 +
> arch/powerpc/platforms/pseries/papr_scm.c | 365 ++
> drivers/nvdimm/Makefile   |   1 +
> drivers/nvdimm/nd_perf.c  | 230 +++
> include/linux/nd.h|  46 +++
> 6 files changed, 678 insertions(+)
> create mode 100644 drivers/nvdimm/nd_perf.c
> 
Thanks and Regards,
R.Nageswara Sastry

>

Re: [PATCH 8/8] membarrier: Rewrite sync_core_before_usermode() and improve documentation

2021-06-16 Thread Will Deacon

On Tue, Jun 15, 2021 at 08:21:13PM -0700, Andy Lutomirski wrote:
> The old sync_core_before_usermode() comments suggested that a 
> non-icache-syncing
> return-to-usermode instruction is x86-specific and that all other
> architectures automatically notice cross-modified code on return to
> userspace.
> 
> This is misleading.  The incantation needed to modify code from one
> CPU and execute it on another CPU is highly architecture dependent.
> On x86, according to the SDM, one must modify the code, issue SFENCE
> if the modification was WC or nontemporal, and then issue a "serializing
> instruction" on the CPU that will execute the code.  membarrier() can do
> the latter.
> 
> On arm64 and powerpc, one must flush the icache and then flush the pipeline
> on the target CPU, although the CPU manuals don't necessarily use this
> language.
> 
> So let's drop any pretense that we can have a generic way to define or
> implement membarrier's SYNC_CORE operation and instead require all
> architectures to define the helper and supply their own documentation as to
> how to use it.  This means x86, arm64, and powerpc for now.  Let's also
> rename the function from sync_core_before_usermode() to
> membarrier_sync_core_before_usermode() because the precise flushing details
> may very well be specific to membarrier, and even the concept of
> "sync_core" in the kernel is mostly an x86-ism.
> 
> (It may well be the case that, on real x86 processors, synchronizing the
>  icache (which requires no action at all) and "flushing the pipeline" is
>  sufficient, but trying to use this language would be confusing at best.
>  LFENCE does something awfully like "flushing the pipeline", but the SDM
>  does not permit LFENCE as an alternative to a "serializing instruction"
>  for this purpose.)
> 
> Cc: Michael Ellerman 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: Nicholas Piggin 
> Cc: Catalin Marinas 
> Cc: Will Deacon 
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: Mathieu Desnoyers 
> Cc: Nicholas Piggin 
> Cc: Peter Zijlstra 
> Cc: x...@kernel.org
> Cc: sta...@vger.kernel.org
> Fixes: 70216e18e519 ("membarrier: Provide core serializing command, 
> *_SYNC_CORE")
> Signed-off-by: Andy Lutomirski 
> ---
>  .../membarrier-sync-core/arch-support.txt | 68 ++-
>  arch/arm64/include/asm/sync_core.h| 19 ++
>  arch/powerpc/include/asm/sync_core.h  | 14 
>  arch/x86/Kconfig  |  1 -
>  arch/x86/include/asm/sync_core.h  |  7 +-
>  arch/x86/kernel/alternative.c |  2 +-
>  arch/x86/kernel/cpu/mce/core.c|  2 +-
>  arch/x86/mm/tlb.c |  3 +-
>  drivers/misc/sgi-gru/grufault.c   |  2 +-
>  drivers/misc/sgi-gru/gruhandles.c |  2 +-
>  drivers/misc/sgi-gru/grukservices.c   |  2 +-
>  include/linux/sched/mm.h  |  1 -
>  include/linux/sync_core.h | 21 --
>  init/Kconfig  |  3 -
>  kernel/sched/membarrier.c | 15 ++--
>  15 files changed, 75 insertions(+), 87 deletions(-)
>  create mode 100644 arch/arm64/include/asm/sync_core.h
>  create mode 100644 arch/powerpc/include/asm/sync_core.h
>  delete mode 100644 include/linux/sync_core.h

For the arm64 bits (docs and asm/sync_core.h):

Acked-by: Will Deacon 

Will

Re: [PATCH v5 13/17] powerpc/pseries/vas: Setup IRQ and fault handling

2021-06-16 Thread Nicholas Piggin

Excerpts from Haren Myneni's message of June 15, 2021 7:01 pm:
> On Mon, 2021-06-14 at 13:07 +1000, Nicholas Piggin wrote:
>> Excerpts from Haren Myneni's message of June 13, 2021 9:02 pm:
>> > NX generates an interrupt when sees a fault on the user space
>> > buffer and the hypervisor forwards that interrupt to OS. Then
>> > the kernel handles the interrupt by issuing H_GET_NX_FAULT hcall
>> > to retrieve the fault CRB information.
>> > 
>> > This patch also adds changes to setup and free IRQ per each
>> > window and also handles the fault by updating the CSB.
>> > 
>> > Signed-off-by: Haren Myneni 
>> > ---
>> >  arch/powerpc/platforms/pseries/vas.c | 108
>> > +++
>> >  1 file changed, 108 insertions(+)
>> > 
>> > diff --git a/arch/powerpc/platforms/pseries/vas.c
>> > b/arch/powerpc/platforms/pseries/vas.c
>> > index fe375f7a7029..55185bdd3776 100644
>> > --- a/arch/powerpc/platforms/pseries/vas.c
>> > +++ b/arch/powerpc/platforms/pseries/vas.c
>> > @@ -11,6 +11,7 @@
>> >  #include 
>> >  #include 
>> >  #include 
>> > +#include 
>> >  #include 
>> >  #include 
>> >  #include 
>> > @@ -190,6 +191,58 @@ int h_query_vas_capabilities(const u64 hcall,
>> > u8 query_type, u64 result)
>> >  }
>> >  EXPORT_SYMBOL_GPL(h_query_vas_capabilities);
>> >  
>> > +/*
>> > + * hcall to get fault CRB from pHyp.
>> > + */
>> > +static int h_get_nx_fault(u32 winid, u64 buffer)
>> > +{
>> > +  long rc;
>> > +
>> > +  rc = plpar_hcall_norets(H_GET_NX_FAULT, winid, buffer);
>> > +
>> > +  switch (rc) {
>> > +  case H_SUCCESS:
>> > +  return 0;
>> > +  case H_PARAMETER:
>> > +  pr_err("HCALL(%x): Invalid window ID %u\n",
>> > H_GET_NX_FAULT,
>> > + winid);
>> > +  return -EINVAL;
>> > +  case H_PRIVILEGE:
>> > +  pr_err("HCALL(%x): Window(%u): Invalid fault buffer
>> > 0x%llx\n",
>> > + H_GET_NX_FAULT, winid, buffer);
>> > +  return -EACCES;
>> > +  default:
>> > +  pr_err("HCALL(%x): Failed with error %ld for
>> > window(%u)\n",
>> > + H_GET_NX_FAULT, rc, winid);
>> > +  return -EIO;
>> 
>> 3 error messages have 3 different formats for window ID.
>> 
>> I agree with Michael you could just have one error message that
>> reports 
>> the return value. Also "H_GET_NX_FAULT: " would be nicer than
>> "HCALL(380): "
> 
> yes, Added just one printk for all error codes except for errors which
> depend on arguments to HCALL (Ex: WinID).
> 
> Sure, I will add just one error message and print all arguments passed
> to HCALL. 
> 
> pr_err("H_GET_NX_FAULT: window(%u), fault buffer(0x%llx) Failed with
> error %ld\n", rc, winid, buffer);

Thanks.

>> 
>> Check how some other hcall failures are reported, "hcall failed: 
>> H_CALL_NAME" seems to have a few takers.
>> 
>> > +  }
>> > +}
>> > +
>> > +/*
>> > + * Handle the fault interrupt.
>> > + * When the fault interrupt is received for each window, query
>> > pHyp to get
>> > + * the fault CRB on the specific fault. Then process the CRB by
>> > updating
>> > + * CSB or send signal if the user space CSB is invalid.
>> > + * Note: pHyp forwards an interrupt for each fault request. So one
>> > fault
>> > + *CRB to process for each H_GET_NX_FAULT HCALL.
>> > + */
>> > +irqreturn_t pseries_vas_fault_thread_fn(int irq, void *data)
>> > +{
>> > +  struct pseries_vas_window *txwin = data;
>> > +  struct coprocessor_request_block crb;
>> > +  struct vas_user_win_ref *tsk_ref;
>> > +  int rc;
>> > +
>> > +  rc = h_get_nx_fault(txwin->vas_win.winid,
>> > (u64)virt_to_phys());
>> > +  if (!rc) {
>> > +  tsk_ref = >vas_win.task_ref;
>> > +  vas_dump_crb();
>> > +  vas_update_csb(, tsk_ref);
>> > +  }
>> > +
>> > +  return IRQ_HANDLED;
>> > +}
>> > +
>> >  /*
>> >   * Allocate window and setup IRQ mapping.
>> >   */
>> > @@ -201,10 +254,51 @@ static int allocate_setup_window(struct
>> > pseries_vas_window *txwin,
>> >rc = h_allocate_vas_window(txwin, domain, wintype,
>> > DEF_WIN_CREDS);
>> >if (rc)
>> >return rc;
>> > +  /*
>> > +   * On powerVM, pHyp setup and forwards the fault interrupt per
>> 
>>The hypervisor forwards the fault interrupt per-window...
>> 
>> > +   * window. So the IRQ setup and fault handling will be done for
>> > +   * each open window separately.
>> > +   */
>> > +  txwin->fault_virq = irq_create_mapping(NULL, txwin->fault_irq);
>> > +  if (!txwin->fault_virq) {
>> > +  pr_err("Failed irq mapping %d\n", txwin->fault_irq);
>> > +  rc = -EINVAL;
>> > +  goto out_win;
>> > +  }
>> > +
>> > +  txwin->name = kasprintf(GFP_KERNEL, "vas-win-%d",
>> > +  txwin->vas_win.winid);
>> > +  if (!txwin->name) {
>> > +  rc = -ENOMEM;
>> > +  goto out_irq;
>> > +  }
>> > +
>> > +  rc = request_threaded_irq(txwin->fault_virq, NULL,
>> > +pseries_vas_fault_thread_fn,
>> > IRQF_ONESHOT,
>> > +

Re: [PATCH v5 12/17] powerpc/pseries/vas: Integrate API with open/close windows

2021-06-16 Thread Nicholas Piggin

Excerpts from Haren Myneni's message of June 15, 2021 5:26 pm:
> On Mon, 2021-06-14 at 12:55 +1000, Nicholas Piggin wrote:
>> Excerpts from Haren Myneni's message of June 13, 2021 9:02 pm:
>> > This patch adds VAS window allocatioa/close with the corresponding
>> > hcalls. Also changes to integrate with the existing user space VAS
>> > API and provide register/unregister functions to NX pseries driver.
>> > 
>> > The driver register function is used to create the user space
>> > interface (/dev/crypto/nx-gzip) and unregister to remove this
>> > entry.
>> > 
>> > The user space process opens this device node and makes an ioctl
>> > to allocate VAS window. The close interface is used to deallocate
>> > window.
>> > 
>> > Signed-off-by: Haren Myneni 
>> > ---
>> >  arch/powerpc/include/asm/vas.h  |   4 +
>> >  arch/powerpc/platforms/pseries/Makefile |   1 +
>> >  arch/powerpc/platforms/pseries/vas.c| 223
>> > 
>> >  3 files changed, 228 insertions(+)
>> > 
>> > diff --git a/arch/powerpc/include/asm/vas.h
>> > b/arch/powerpc/include/asm/vas.h
>> > index eefc758d8cd4..9d5646d721c4 100644
>> > --- a/arch/powerpc/include/asm/vas.h
>> > +++ b/arch/powerpc/include/asm/vas.h
>> > @@ -254,6 +254,10 @@ struct vas_all_caps {
>> >u64 feat_type;
>> >  };
>> >  
>> > +int h_query_vas_capabilities(const u64 hcall, u8 query_type, u64
>> > result);
>> > +int vas_register_api_pseries(struct module *mod,
>> > +   enum vas_cop_type cop_type, const char
>> > *name);
>> > +void vas_unregister_api_pseries(void);
>> >  #endif
>> >  
>> >  /*
>> > diff --git a/arch/powerpc/platforms/pseries/Makefile
>> > b/arch/powerpc/platforms/pseries/Makefile
>> > index c8a2b0b05ac0..4cda0ef87be0 100644
>> > --- a/arch/powerpc/platforms/pseries/Makefile
>> > +++ b/arch/powerpc/platforms/pseries/Makefile
>> > @@ -30,3 +30,4 @@ obj-$(CONFIG_PPC_SVM)+= svm.o
>> >  obj-$(CONFIG_FA_DUMP) += rtas-fadump.o
>> >  
>> >  obj-$(CONFIG_SUSPEND) += suspend.o
>> > +obj-$(CONFIG_PPC_VAS) += vas.o
>> > diff --git a/arch/powerpc/platforms/pseries/vas.c
>> > b/arch/powerpc/platforms/pseries/vas.c
>> > index 98109a13f1c2..fe375f7a7029 100644
>> > --- a/arch/powerpc/platforms/pseries/vas.c
>> > +++ b/arch/powerpc/platforms/pseries/vas.c
>> > @@ -10,6 +10,7 @@
>> >  #include 
>> >  #include 
>> >  #include 
>> > +#include 
>> >  #include 
>> >  #include 
>> >  #include 
>> > @@ -187,6 +188,228 @@ int h_query_vas_capabilities(const u64 hcall,
>> > u8 query_type, u64 result)
>> >return -EIO;
>> >}
>> >  }
>> > +EXPORT_SYMBOL_GPL(h_query_vas_capabilities);
>> > +
>> > +/*
>> > + * Allocate window and setup IRQ mapping.
>> > + */
>> > +static int allocate_setup_window(struct pseries_vas_window *txwin,
>> > +   u64 *domain, u8 wintype)
>> > +{
>> > +  int rc;
>> > +
>> > +  rc = h_allocate_vas_window(txwin, domain, wintype,
>> > DEF_WIN_CREDS);
>> > +  if (rc)
>> > +  return rc;
>> > +
>> > +  txwin->vas_win.wcreds_max = DEF_WIN_CREDS;
>> > +
>> > +  return 0;
>> > +}
>> > +
>> > +static struct vas_window *vas_allocate_window(struct
>> > vas_tx_win_open_attr *uattr,
>> > +enum vas_cop_type
>> > cop_type)
>> > +{
>> > +  long domain[PLPAR_HCALL9_BUFSIZE] = {VAS_DEFAULT_DOMAIN_ID};
>> > +  struct vas_ct_caps *ct_caps;
>> > +  struct vas_caps *caps;
>> > +  struct pseries_vas_window *txwin;
>> > +  int rc;
>> > +
>> > +  txwin = kzalloc(sizeof(*txwin), GFP_KERNEL);
>> > +  if (!txwin)
>> > +  return ERR_PTR(-ENOMEM);
>> > +
>> > +  /*
>> > +   * A VAS window can have many credits which means that many
>> > +   * requests can be issued simultaneously. But phyp restricts
>> > +   * one credit per window.
>> > +   * phyp introduces 2 different types of credits:
>> > +   * Default credit type (Uses normal priority FIFO):
>> > +   *  A limited number of credits are assigned to partitions
>> > +   *  based on processor entitlement. But these credits may be
>> > +   *  over-committed on a system depends on whether the CPUs
>> > +   *  are in shared or dedicated modes - that is, more requests
>> > +   *  may be issued across the system than NX can service at
>> > +   *  once which can result in paste command failure (RMA_busy).
>> > +   *  Then the process has to resend requests or fall-back to
>> > +   *  SW compression.
>> > +   * Quality of Service (QoS) credit type (Uses high priority
>> > FIFO):
>> > +   *  To avoid NX HW contention, the system admins can assign
>> > +   *  QoS credits for each LPAR so that this partition is
>> > +   *  guaranteed access to NX resources. These credits are
>> > +   *  assigned to partitions via the HMC.
>> > +   *  Refer PAPR for more information.
>> > +   *
>> > +   * Allocate window with QoS credits if user requested.
>> > Otherwise
>> > +   * default credits are used.
>> > +   */
>> >

Re: [PATCH v5 04/17] powerpc/vas: Add platform specific user window operations

2021-06-16 Thread Nicholas Piggin

Excerpts from Haren Myneni's message of June 15, 2021 4:37 pm:
> On Mon, 2021-06-14 at 12:24 +1000, Nicholas Piggin wrote:
>> Excerpts from Haren Myneni's message of June 13, 2021 8:57 pm:
>> > PowerNV uses registers to open/close VAS windows, and getting the
>> > paste address. Whereas the hypervisor calls are used on PowerVM.
>> > 
>> > This patch adds the platform specific user space window operations
>> > and register with the common VAS user space interface.
>> > 
>> > Signed-off-by: Haren Myneni 
>> > ---
>> >  arch/powerpc/include/asm/vas.h  | 14 +-
>> >  arch/powerpc/platforms/book3s/vas-api.c | 53 +--
>> > --
>> >  arch/powerpc/platforms/powernv/vas-window.c | 45 -
>> >  3 files changed, 89 insertions(+), 23 deletions(-)
>> > 
>> > diff --git a/arch/powerpc/include/asm/vas.h
>> > b/arch/powerpc/include/asm/vas.h
>> > index bab7891d43f5..85318d7446c7 100644
>> > --- a/arch/powerpc/include/asm/vas.h
>> > +++ b/arch/powerpc/include/asm/vas.h
>> > @@ -5,6 +5,7 @@
>> >  
>> >  #ifndef _ASM_POWERPC_VAS_H
>> >  #define _ASM_POWERPC_VAS_H
>> > +#include 
>> >  
>> >  struct vas_window;
>> >  
>> > @@ -48,6 +49,16 @@ enum vas_cop_type {
>> >VAS_COP_TYPE_MAX,
>> >  };
>> >  
>> > +/*
>> > + * User space window operations used for powernv and powerVM
>> > + */
>> > +struct vas_user_win_ops {
>> > +  struct vas_window * (*open_win)(struct vas_tx_win_open_attr *,
>> > +  enum vas_cop_type);
>> > +  u64 (*paste_addr)(struct vas_window *);
>> > +  int (*close_win)(struct vas_window *);
>> > +};
>> 
>> This looks better, but rather than pull in uapi and the user API 
>> structure here, could you just pass in vas_id and flags after the
>> common 
>> code does the user copy and verifies the version and other details?
>> 
>> I think it's generally good practice to limit the data that the usre
>> can influence as much as possible. Sorry for not picking up on that
>> earlier.
> 
> The user space pass vas_tx_win_open_attr struct - use only vas_id and
> flags right now but it can be extended in future with reserve elements.
> So passing the same struct to platform specific API.
> 
> do you prefer "struct vas_window * (*open_win)(vas_id, flags, cop)" and
> extend later when more elments are used?

Yes I think so. The reason being so you don't sending data under the
control of user very far into the kernel. Better safe than sorry.

Thanks,
Nick

[PATCH v3] lockdown, selinux: fix wrong subject in some SELinux lockdown checks

2021-06-16 Thread Ondrej Mosnacek

Commit 59438b46471a ("security,lockdown,selinux: implement SELinux
lockdown") added an implementation of the locked_down LSM hook to
SELinux, with the aim to restrict which domains are allowed to perform
operations that would breach lockdown.

However, in several places the security_locked_down() hook is called in
situations where the current task isn't doing any action that would
directly breach lockdown, leading to SELinux checks that are basically
bogus.

To fix this, add an explicit struct cred pointer argument to
security_lockdown() and define NULL as a special value to pass instead
of current_cred() in such situations. LSMs that take the subject
credentials into account can then fall back to some default or ignore
such calls altogether. In the SELinux lockdown hook implementation, use
SECINITSID_KERNEL in case the cred argument is NULL.

Most of the callers are updated to pass current_cred() as the cred
pointer, thus maintaining the same behavior. The following callers are
modified to pass NULL as the cred pointer instead:
1. arch/powerpc/xmon/xmon.c
 Seems to be some interactive debugging facility. It appears that
 the lockdown hook is called from interrupt context here, so it
 should be more appropriate to request a global lockdown decision.
2. fs/tracefs/inode.c:tracefs_create_file()
 Here the call is used to prevent creating new tracefs entries when
 the kernel is locked down. Assumes that locking down is one-way -
 i.e. if the hook returns non-zero once, it will never return zero
 again, thus no point in creating these files. Also, the hook is
 often called by a module's init function when it is loaded by
 userspace, where it doesn't make much sense to do a check against
 the current task's creds, since the task itself doesn't actually
 use the tracing functionality (i.e. doesn't breach lockdown), just
 indirectly makes some new tracepoints available to whoever is
 authorized to use them.
3. net/xfrm/xfrm_user.c:copy_to_user_*()
 Here a cryptographic secret is redacted based on the value returned
 from the hook. There are two possible actions that may lead here:
 a) A netlink message XFRM_MSG_GETSA with NLM_F_DUMP set - here the
task context is relevant, since the dumped data is sent back to
the current task.
 b) When adding/deleting/updating an SA via XFRM_MSG_xxxSA, the
dumped SA is broadcasted to tasks subscribed to XFRM events -
here the current task context is not relevant as it doesn't
represent the tasks that could potentially see the secret.
 It doesn't seem worth it to try to keep using the current task's
 context in the a) case, since the eventual data leak can be
 circumvented anyway via b), plus there is no way for the task to
 indicate that it doesn't care about the actual key value, so the
 check could generate a lot of "false alert" denials with SELinux.
 Thus, let's pass NULL instead of current_cred() here faute de
 mieux.

Improvements-suggested-by: Casey Schaufler 
Improvements-suggested-by: Paul Moore 
Fixes: 59438b46471a ("security,lockdown,selinux: implement SELinux lockdown")
Signed-off-by: Ondrej Mosnacek 
---

v3:
- add the cred argument to security_locked_down() and adapt all callers
- keep using current_cred() in BPF, as the hook calls have been shifted
  to program load time (commit ff40e51043af ("bpf, lockdown, audit: Fix
  buggy SELinux lockdown permission checks"))
- in SELinux, don't ignore hook calls where cred == NULL, but use
  SECINITSID_KERNEL as the subject instead
- update explanations in the commit message

v2: https://lore.kernel.org/lkml/20210517092006.803332-1-omosn...@redhat.com/
- change to a single hook based on suggestions by Casey Schaufler

v1: https://lore.kernel.org/lkml/20210507114048.138933-1-omosn...@redhat.com/

 arch/powerpc/xmon/xmon.c |  4 ++--
 arch/x86/kernel/ioport.c |  4 ++--
 arch/x86/kernel/msr.c|  4 ++--
 arch/x86/mm/testmmiotrace.c  |  2 +-
 drivers/acpi/acpi_configfs.c |  2 +-
 drivers/acpi/custom_method.c |  2 +-
 drivers/acpi/osl.c   |  3 ++-
 drivers/acpi/tables.c|  2 +-
 drivers/char/mem.c   |  2 +-
 drivers/cxl/mem.c|  2 +-
 drivers/firmware/efi/efi.c   |  2 +-
 drivers/firmware/efi/test/efi_test.c |  2 +-
 drivers/pci/pci-sysfs.c  |  6 +++---
 drivers/pci/proc.c   |  6 +++---
 drivers/pci/syscall.c|  2 +-
 drivers/pcmcia/cistpl.c  |  2 +-
 drivers/tty/serial/serial_core.c |  2 +-
 fs/debugfs/file.c|  2 +-
 fs/debugfs/inode.c   |  2 +-
 fs/proc/kcore.c  |  2 +-
 fs/tracefs/inode.c   |  2 +-
 include/linux/lsm_hook_defs.h|  2 +-
 include/linux/lsm_hooks.h|  1 +
 include/linux/security.h |  4

[PATCH v13 3/3] kasan: define and use MAX_PTRS_PER_* for early shadow tables

2021-06-16 Thread Daniel Axtens

powerpc has a variable number of PTRS_PER_*, set at runtime based
on the MMU that the kernel is booted under.

This means the PTRS_PER_* are no longer constants, and therefore
breaks the build.

Define default MAX_PTRS_PER_*s in the same style as MAX_PTRS_PER_P4D.
As KASAN is the only user at the moment, just define them in the kasan
header, and have them default to PTRS_PER_* unless overridden in arch
code.

Suggested-by: Christophe Leroy 
Suggested-by: Balbir Singh 
Reviewed-by: Christophe Leroy 
Reviewed-by: Balbir Singh 
Signed-off-by: Daniel Axtens 
---
 include/linux/kasan.h | 18 +++---
 mm/kasan/init.c   |  6 +++---
 2 files changed, 18 insertions(+), 6 deletions(-)

diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index 768d7d342757..fd65f477ac92 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -40,10 +40,22 @@ struct kunit_kasan_expectation {
 #define PTE_HWTABLE_PTRS 0
 #endif
 
+#ifndef MAX_PTRS_PER_PTE
+#define MAX_PTRS_PER_PTE PTRS_PER_PTE
+#endif
+
+#ifndef MAX_PTRS_PER_PMD
+#define MAX_PTRS_PER_PMD PTRS_PER_PMD
+#endif
+
+#ifndef MAX_PTRS_PER_PUD
+#define MAX_PTRS_PER_PUD PTRS_PER_PUD
+#endif
+
 extern unsigned char kasan_early_shadow_page[PAGE_SIZE];
-extern pte_t kasan_early_shadow_pte[PTRS_PER_PTE + PTE_HWTABLE_PTRS];
-extern pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD];
-extern pud_t kasan_early_shadow_pud[PTRS_PER_PUD];
+extern pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE + PTE_HWTABLE_PTRS];
+extern pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD];
+extern pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD];
 extern p4d_t kasan_early_shadow_p4d[MAX_PTRS_PER_P4D];
 
 int kasan_populate_early_shadow(const void *shadow_start,
diff --git a/mm/kasan/init.c b/mm/kasan/init.c
index 348f31d15a97..cc64ed6858c6 100644
--- a/mm/kasan/init.c
+++ b/mm/kasan/init.c
@@ -41,7 +41,7 @@ static inline bool kasan_p4d_table(pgd_t pgd)
 }
 #endif
 #if CONFIG_PGTABLE_LEVELS > 3
-pud_t kasan_early_shadow_pud[PTRS_PER_PUD] __page_aligned_bss;
+pud_t kasan_early_shadow_pud[MAX_PTRS_PER_PUD] __page_aligned_bss;
 static inline bool kasan_pud_table(p4d_t p4d)
 {
return p4d_page(p4d) == virt_to_page(lm_alias(kasan_early_shadow_pud));
@@ -53,7 +53,7 @@ static inline bool kasan_pud_table(p4d_t p4d)
 }
 #endif
 #if CONFIG_PGTABLE_LEVELS > 2
-pmd_t kasan_early_shadow_pmd[PTRS_PER_PMD] __page_aligned_bss;
+pmd_t kasan_early_shadow_pmd[MAX_PTRS_PER_PMD] __page_aligned_bss;
 static inline bool kasan_pmd_table(pud_t pud)
 {
return pud_page(pud) == virt_to_page(lm_alias(kasan_early_shadow_pmd));
@@ -64,7 +64,7 @@ static inline bool kasan_pmd_table(pud_t pud)
return false;
 }
 #endif
-pte_t kasan_early_shadow_pte[PTRS_PER_PTE + PTE_HWTABLE_PTRS]
+pte_t kasan_early_shadow_pte[MAX_PTRS_PER_PTE + PTE_HWTABLE_PTRS]
__page_aligned_bss;
 
 static inline bool kasan_pte_table(pmd_t pmd)
-- 
2.30.2

[PATCH v13 2/3] kasan: allow architectures to provide an outline readiness check

2021-06-16 Thread Daniel Axtens

Allow architectures to define a kasan_arch_is_ready() hook that bails
out of any function that's about to touch the shadow unless the arch
says that it is ready for the memory to be accessed. This is fairly
uninvasive and should have a negligible performance penalty.

This will only work in outline mode, so an arch must specify
ARCH_DISABLE_KASAN_INLINE if it requires this.

Cc: Balbir Singh 
Cc: Aneesh Kumar K.V 
Suggested-by: Christophe Leroy 
Signed-off-by: Daniel Axtens 

--

I discuss the justfication for this later in the series. Also,
both previous RFCs for ppc64 - by 2 different people - have
needed this trick! See:
 - https://lore.kernel.org/patchwork/patch/592820/ # ppc64 hash series
 - https://patchwork.ozlabs.org/patch/795211/  # ppc radix series
---
 mm/kasan/common.c  | 4 
 mm/kasan/generic.c | 3 +++
 mm/kasan/kasan.h   | 4 
 mm/kasan/shadow.c  | 8 
 4 files changed, 19 insertions(+)

diff --git a/mm/kasan/common.c b/mm/kasan/common.c
index 10177cc26d06..0ad615f3801d 100644
--- a/mm/kasan/common.c
+++ b/mm/kasan/common.c
@@ -331,6 +331,10 @@ static inline bool kasan_slab_free(struct kmem_cache 
*cache, void *object,
u8 tag;
void *tagged_object;
 
+   /* Bail if the arch isn't ready */
+   if (!kasan_arch_is_ready())
+   return false;
+
tag = get_tag(object);
tagged_object = object;
object = kasan_reset_tag(object);
diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c
index 53cbf28859b5..c3f5ba7a294a 100644
--- a/mm/kasan/generic.c
+++ b/mm/kasan/generic.c
@@ -163,6 +163,9 @@ static __always_inline bool check_region_inline(unsigned 
long addr,
size_t size, bool write,
unsigned long ret_ip)
 {
+   if (!kasan_arch_is_ready())
+   return true;
+
if (unlikely(size == 0))
return true;
 
diff --git a/mm/kasan/kasan.h b/mm/kasan/kasan.h
index 8f450bc28045..19323a3d5975 100644
--- a/mm/kasan/kasan.h
+++ b/mm/kasan/kasan.h
@@ -449,6 +449,10 @@ static inline void kasan_poison_last_granule(const void 
*address, size_t size) {
 
 #endif /* CONFIG_KASAN_GENERIC */
 
+#ifndef kasan_arch_is_ready
+static inline bool kasan_arch_is_ready(void)   { return true; }
+#endif
+
 /*
  * Exported functions for interfaces called from assembly or from generated
  * code. Declarations here to avoid warning about missing declarations.
diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c
index 082ee5b6d9a1..3c7f7efe6f68 100644
--- a/mm/kasan/shadow.c
+++ b/mm/kasan/shadow.c
@@ -73,6 +73,10 @@ void kasan_poison(const void *addr, size_t size, u8 value, 
bool init)
 {
void *shadow_start, *shadow_end;
 
+   /* Don't touch the shadow memory if arch isn't ready */
+   if (!kasan_arch_is_ready())
+   return;
+
/*
 * Perform shadow offset calculation based on untagged address, as
 * some of the callers (e.g. kasan_poison_object_data) pass tagged
@@ -99,6 +103,10 @@ EXPORT_SYMBOL(kasan_poison);
 #ifdef CONFIG_KASAN_GENERIC
 void kasan_poison_last_granule(const void *addr, size_t size)
 {
+   /* Don't touch the shadow memory if arch isn't ready */
+   if (!kasan_arch_is_ready())
+   return;
+
if (size & KASAN_GRANULE_MASK) {
u8 *shadow = (u8 *)kasan_mem_to_shadow(addr + size);
*shadow = size & KASAN_GRANULE_MASK;
-- 
2.30.2

[PATCH v13 1/3] kasan: allow an architecture to disable inline instrumentation

2021-06-16 Thread Daniel Axtens

For annoying architectural reasons, it's very difficult to support inline
instrumentation on powerpc64.*

Add a Kconfig flag to allow an arch to disable inline. (It's a bit
annoying to be 'backwards', but I'm not aware of any way to have
an arch force a symbol to be 'n', rather than 'y'.)

We also disable stack instrumentation in this case as it does things that
are functionally equivalent to inline instrumentation, namely adding
code that touches the shadow directly without going through a C helper.

* on ppc64 atm, the shadow lives in virtual memory and isn't accessible in
real mode. However, before we turn on virtual memory, we parse the device
tree to determine which platform and MMU we're running under. That calls
generic DT code, which is instrumented. Inline instrumentation in DT would
unconditionally attempt to touch the shadow region, which we won't have
set up yet, and would crash. We can make outline mode wait for the arch to
be ready, but we can't change what the compiler inserts for inline mode.

Signed-off-by: Daniel Axtens 
---
 lib/Kconfig.kasan | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
index cffc2ebbf185..cb5e02d09e11 100644
--- a/lib/Kconfig.kasan
+++ b/lib/Kconfig.kasan
@@ -12,6 +12,15 @@ config HAVE_ARCH_KASAN_HW_TAGS
 config HAVE_ARCH_KASAN_VMALLOC
bool
 
+config ARCH_DISABLE_KASAN_INLINE
+   bool
+   help
+ Sometimes an architecture might not be able to support inline
+ instrumentation but might be able to support outline instrumentation.
+ This option allows an architecture to prevent inline and stack
+ instrumentation from being enabled.
+
+
 config CC_HAS_KASAN_GENERIC
def_bool $(cc-option, -fsanitize=kernel-address)
 
@@ -130,6 +139,7 @@ config KASAN_OUTLINE
 
 config KASAN_INLINE
bool "Inline instrumentation"
+   depends on !ARCH_DISABLE_KASAN_INLINE
help
  Compiler directly inserts code checking shadow memory before
  memory accesses. This is faster than outline (in some workloads
@@ -141,6 +151,7 @@ endchoice
 config KASAN_STACK
bool "Enable stack instrumentation (unsafe)" if CC_IS_CLANG && 
!COMPILE_TEST
depends on KASAN_GENERIC || KASAN_SW_TAGS
+   depends on !ARCH_DISABLE_KASAN_INLINE
default y if CC_IS_GCC
help
  The LLVM stack address sanitizer has a know problem that
@@ -154,6 +165,9 @@ config KASAN_STACK
  but clang users can still enable it for builds without
  CONFIG_COMPILE_TEST.  On gcc it is assumed to always be safe
  to use and enabled by default.
+ If the architecture disables inline instrumentation, this is
+ also disabled as it adds inline-style instrumentation that
+ is run unconditionally.
 
 config KASAN_SW_TAGS_IDENTIFY
bool "Enable memory corruption identification"
-- 
2.30.2

[PATCH v13 0/3] KASAN core changes for ppc64 radix KASAN

2021-06-16 Thread Daniel Axtens

Building on the work of Christophe, Aneesh and Balbir, I've ported
KASAN to 64-bit Book3S kernels running on the Radix MMU. I've been
trying this for a while, but we keep having collisions between the
kasan code in the mm tree and the code I want to put in to the ppc
tree.

So this series just contains the kasan core changes that we
need. These can go in via the mm tree. I will then propose the powerpc
changes for a later cycle. (The most recent RFC for the powerpc
changes is in the last series at
https://lore.kernel.org/linux-mm/20210615014705.2234866-1-...@axtens.net/
)

v13 applies to next-20210611. There should be no noticeable changes to
other platforms.

Changes since v12: respond to Marco's review comments - clean up the
help for ARCH_DISABLE_KASAN_INLINE, and add an arch readiness check to
the new granule poisioning function. Thanks Marco.

Kind regards,
Daniel

Daniel Axtens (3):
  kasan: allow an architecture to disable inline instrumentation
  kasan: allow architectures to provide an outline readiness check
  kasan: define and use MAX_PTRS_PER_* for early shadow tables

 include/linux/kasan.h | 18 +++---
 lib/Kconfig.kasan | 14 ++
 mm/kasan/common.c |  4 
 mm/kasan/generic.c|  3 +++
 mm/kasan/init.c   |  6 +++---
 mm/kasan/kasan.h  |  4 
 mm/kasan/shadow.c |  8 
 7 files changed, 51 insertions(+), 6 deletions(-)

-- 
2.30.2

Re: [PATCH v12 09/12] swiotlb: Add restricted DMA alloc/free support

2021-06-16 Thread Christoph Hellwig

On Wed, Jun 16, 2021 at 02:21:54PM +0800, Claire Chang wrote:
> Add the functions, swiotlb_{alloc,free} and is_swiotlb_for_alloc to
> support the memory allocation from restricted DMA pool.
> 
> The restricted DMA pool is preferred if available.
> 
> Note that since coherent allocation needs remapping, one must set up
> another device coherent pool by shared-dma-pool and use
> dma_alloc_from_dev_coherent instead for atomic coherent allocation.
> 
> Signed-off-by: Claire Chang 

Looks good,

Reviewed-by: Christoph Hellwig

Re: [PATCH v12 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

2021-06-16 Thread Christoph Hellwig

On Wed, Jun 16, 2021 at 02:21:51PM +0800, Claire Chang wrote:
> Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and
> use it to determine whether to bounce the data or not. This will be
> useful later to allow for different pools.
> 
> Signed-off-by: Claire Chang 

Looks good,

Reviewed-by: Christoph Hellwig

[RFC PATCH powerpc] powerpc/64: fix duplicated inclusion

2021-06-16 Thread kernel test robot

arch/powerpc/kernel/interrupt_64.S: asm/head-64.h is included more than once.

Generated by: scripts/checkincludes.pl

Reported-by: kernel test robot 
Signed-off-by: kernel test robot 
---
 interrupt_64.S |1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/kernel/interrupt_64.S 
b/arch/powerpc/kernel/interrupt_64.S
index 83826775d239a..b201b1ef30d10 100644
--- a/arch/powerpc/kernel/interrupt_64.S
+++ b/arch/powerpc/kernel/interrupt_64.S
@@ -1,5 +1,4 @@
 #include 
-#include 
 #include 
 #include 
 #include

[powerpc:next-test 113/124] arch/powerpc/kernel/interrupt_64.S: asm/head-64.h is included more than once.

2021-06-16 Thread kernel test robot

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
next-test
head:   103bf32b0d2dd8b8a4d3d9ebdded5ba4e8263e6a
commit: 5592c877d21b6ca201aafca349663c5a41f134f0 [113/124] powerpc/64: move 
interrupt return asm to interrupt_64.S
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 


includecheck warnings: (new ones prefixed by >>)
>> arch/powerpc/kernel/interrupt_64.S: asm/head-64.h is included more than once.

Please review and possibly fold the followup patch.

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org

Re: Oops (NULL pointer) with 'perf record' of selftest 'null_syscall'

2021-06-16 Thread Christophe Leroy





Le 16/06/2021 à 08:33, Madhavan Srinivasan a écrit :


On 6/16/21 11:56 AM, Christophe Leroy wrote:



Le 16/06/2021 à 05:40, Athira Rajeev a écrit :




On 16-Jun-2021, at 8:53 AM, Madhavan Srinivasan  wrote:


On 6/15/21 8:35 PM, Christophe Leroy wrote:
For your information, I'm getting the following Oops. Detected with 5.13-rc6, it also oopses on 
5.12 and 5.11.

Runs ok on 5.10. I'm starting bisecting now.



Thanks for reporting, got the issue. What has happened in this case is that, pmu device is not 
registered
and trying to access the instruction point which will land in perf_instruction_pointer(). And 
recently I have added
a workaround patch for power10 DD1 which has caused this breakage. My bad. We are working on a 
fix patch

for the same and will post it out. Sorry again.



Hi Christophe,

Can you please try with below patch in your environment and test if it works 
for you.

 From 55d3afc9369dfbe28a7152c8e9f856c11c7fe43d Mon Sep 17 00:00:00 2001
From: Athira Rajeev 
Date: Tue, 15 Jun 2021 22:28:11 -0400
Subject: [PATCH] powerpc/perf: Fix crash with 'perf_instruction_pointer' when
  pmu is not set

On systems without any specific PMU driver support registered, running
perf record causes oops:

[   38.841073] NIP [c013af54] perf_instruction_pointer+0x24/0x100
[   38.841079] LR [c03c7358] perf_prepare_sample+0x4e8/0x820
[   38.841085] --- interrupt: 300
[   38.841088] [c0001cf03440] [c03c6ef8] 
perf_prepare_sample+0x88/0x820 (unreliable)
[   38.841096] [c0001cf034a0] [c03c76d0] 
perf_event_output_forward+0x40/0xc0
[   38.841104] [c0001cf03520] [c03b45e8] 
__perf_event_overflow+0x88/0x1b0
[   38.841112] [c0001cf03570] [c03b480c] 
perf_swevent_hrtimer+0xfc/0x1a0
[   38.841119] [c0001cf03740] [c02399cc] 
__hrtimer_run_queues+0x17c/0x380
[   38.841127] [c0001cf037c0] [c023a5f8] 
hrtimer_interrupt+0x128/0x2f0
[   38.841135] [c0001cf03870] [c002962c] timer_interrupt+0x13c/0x370
[   38.841143i] [c0001cf038d0] [c0009ba4] 
decrementer_common_virt+0x1a4/0x1b0
[   38.841151] --- interrupt: 900 at copypage_power7+0xd4/0x1c0

During perf record session, perf_instruction_pointer() is called to
capture the sample ip. This function in core-book3s accesses ppmu->flags.
If a platform specific PMU driver is not registered, ppmu is set to NULL
and accessing its members results in a crash. Fix this crash by checking
if ppmu is set.

Signed-off-by: Athira Rajeev 
Reported-by: Christophe Leroy 


Fixes: 2ca13a4cc56c ("powerpc/perf: Use regs->nip when SIAR is zero")
Cc: sta...@vger.kernel.org
Tested-by: Christophe Leroy 

Thanks, but just wonder what is the system config and processor version in 
which you got this fail.
Reason, we do have generic-pmu which should kick-in in absence of a platform 
specific driver.




It's an mpc8321 (book3s/32)

Christophe

[PATCH V2 3/3] cpufreq: powerenv: Migrate to ->exit() callback instead of ->stop_cpu()

2021-06-16 Thread Viresh Kumar

commit 367dc4aa932b ("cpufreq: Add stop CPU callback to cpufreq_driver
interface") added the stop_cpu() callback to allow the drivers to do
clean up before the CPU is completely down and its state can't be
modified.

At that time the CPU hotplug framework used to call the cpufreq core's
registered notifier for different events like CPU_DOWN_PREPARE and
CPU_POST_DEAD. The stop_cpu() callback was called during the
CPU_DOWN_PREPARE event.

This is no longer the case, cpuhp_cpufreq_offline() is called only once
by the CPU hotplug core now and we don't really need two separate
callbacks for cpufreq drivers, i.e. stop_cpu() and exit(), as everything
can be done from the exit() callback itself.

Migrate to using the exit() callback instead of stop_cpu().

Signed-off-by: Viresh Kumar 
---
 drivers/cpufreq/powernv-cpufreq.c | 23 +--
 1 file changed, 9 insertions(+), 14 deletions(-)

diff --git a/drivers/cpufreq/powernv-cpufreq.c 
b/drivers/cpufreq/powernv-cpufreq.c
index e439b43c19eb..005600cef273 100644
--- a/drivers/cpufreq/powernv-cpufreq.c
+++ b/drivers/cpufreq/powernv-cpufreq.c
@@ -875,7 +875,15 @@ static int powernv_cpufreq_cpu_init(struct cpufreq_policy 
*policy)
 
 static int powernv_cpufreq_cpu_exit(struct cpufreq_policy *policy)
 {
-   /* timer is deleted in cpufreq_cpu_stop() */
+   struct powernv_smp_call_data freq_data;
+   struct global_pstate_info *gpstates = policy->driver_data;
+
+   freq_data.pstate_id = idx_to_pstate(powernv_pstate_info.min);
+   freq_data.gpstate_id = idx_to_pstate(powernv_pstate_info.min);
+   smp_call_function_single(policy->cpu, set_pstate, _data, 1);
+   if (gpstates)
+   del_timer_sync(>timer);
+
kfree(policy->driver_data);
 
return 0;
@@ -1007,18 +1015,6 @@ static struct notifier_block powernv_cpufreq_opal_nb = {
.priority   = 0,
 };
 
-static void powernv_cpufreq_stop_cpu(struct cpufreq_policy *policy)
-{
-   struct powernv_smp_call_data freq_data;
-   struct global_pstate_info *gpstates = policy->driver_data;
-
-   freq_data.pstate_id = idx_to_pstate(powernv_pstate_info.min);
-   freq_data.gpstate_id = idx_to_pstate(powernv_pstate_info.min);
-   smp_call_function_single(policy->cpu, set_pstate, _data, 1);
-   if (gpstates)
-   del_timer_sync(>timer);
-}
-
 static unsigned int powernv_fast_switch(struct cpufreq_policy *policy,
unsigned int target_freq)
 {
@@ -1042,7 +1038,6 @@ static struct cpufreq_driver powernv_cpufreq_driver = {
.target_index   = powernv_cpufreq_target_index,
.fast_switch= powernv_fast_switch,
.get= powernv_cpufreq_get,
-   .stop_cpu   = powernv_cpufreq_stop_cpu,
.attr   = powernv_cpu_freq_attr,
 };
 
-- 
2.31.1.272.g89b43f80a514

[PATCH V2 0/3] cpufreq: Migrate away from ->stop_cpu() callback

2021-06-16 Thread Viresh Kumar

Hi Rafael,

Sending these separately from CPPC stuff to avoid unnecessary confusion and
independent merging of these patches. These should get in nevertheless.

commit 367dc4aa932b ("cpufreq: Add stop CPU callback to cpufreq_driver
interface") added the stop_cpu() callback to allow the drivers to do
clean up before the CPU is completely down and its state can't be
modified.

At that time the CPU hotplug framework used to call the cpufreq core's
registered notifier for different events like CPU_DOWN_PREPARE and
CPU_POST_DEAD. The stop_cpu() callback was called during the
CPU_DOWN_PREPARE event.

This is no longer the case, cpuhp_cpufreq_offline() is called only once
by the CPU hotplug core now and we don't really need two separate
callbacks for cpufreq drivers, i.e. stop_cpu() and exit(), as everything
can be done from the exit() callback itself.

Migrate to using the exit() callback instead of stop_cpu().

The stop_cpu() callback isn't removed from core as it will be reused in
a different way in a separate patchset.

--
Viresh

Viresh Kumar (3):
  cpufreq: cppc: Migrate to ->exit() callback instead of ->stop_cpu()
  cpufreq: intel_pstate: Migrate to ->exit() callback instead of
->stop_cpu()
  cpufreq: powerenv: Migrate to ->exit() callback instead of
->stop_cpu()

 drivers/cpufreq/cppc_cpufreq.c| 46 ---
 drivers/cpufreq/intel_pstate.c|  9 +-
 drivers/cpufreq/powernv-cpufreq.c | 23 ++--
 3 files changed, 34 insertions(+), 44 deletions(-)

-- 
2.31.1.272.g89b43f80a514

Re: Oops (NULL pointer) with 'perf record' of selftest 'null_syscall'

2021-06-16 Thread Madhavan Srinivasan




On 6/16/21 11:56 AM, Christophe Leroy wrote:



Le 16/06/2021 à 05:40, Athira Rajeev a écrit :



On 16-Jun-2021, at 8:53 AM, Madhavan Srinivasan 
 wrote:



On 6/15/21 8:35 PM, Christophe Leroy wrote:
For your information, I'm getting the following Oops. Detected with 
5.13-rc6, it also oopses on 5.12 and 5.11.

Runs ok on 5.10. I'm starting bisecting now.



Thanks for reporting, got the issue. What has happened in this case 
is that, pmu device is not registered
and trying to access the instruction point which will land in 
perf_instruction_pointer(). And recently I have added
a workaround patch for power10 DD1 which has caused this breakage. 
My bad. We are working on a fix patch

for the same and will post it out. Sorry again.



Hi Christophe,

Can you please try with below patch in your environment and test if 
it works for you.


 From 55d3afc9369dfbe28a7152c8e9f856c11c7fe43d Mon Sep 17 00:00:00 2001
From: Athira Rajeev 
Date: Tue, 15 Jun 2021 22:28:11 -0400
Subject: [PATCH] powerpc/perf: Fix crash with 
'perf_instruction_pointer' when

  pmu is not set

On systems without any specific PMU driver support registered, running
perf record causes oops:

[   38.841073] NIP [c013af54] 
perf_instruction_pointer+0x24/0x100

[   38.841079] LR [c03c7358] perf_prepare_sample+0x4e8/0x820
[   38.841085] --- interrupt: 300
[   38.841088] [c0001cf03440] [c03c6ef8] 
perf_prepare_sample+0x88/0x820 (unreliable)
[   38.841096] [c0001cf034a0] [c03c76d0] 
perf_event_output_forward+0x40/0xc0
[   38.841104] [c0001cf03520] [c03b45e8] 
__perf_event_overflow+0x88/0x1b0
[   38.841112] [c0001cf03570] [c03b480c] 
perf_swevent_hrtimer+0xfc/0x1a0
[   38.841119] [c0001cf03740] [c02399cc] 
__hrtimer_run_queues+0x17c/0x380
[   38.841127] [c0001cf037c0] [c023a5f8] 
hrtimer_interrupt+0x128/0x2f0
[   38.841135] [c0001cf03870] [c002962c] 
timer_interrupt+0x13c/0x370
[   38.841143i] [c0001cf038d0] [c0009ba4] 
decrementer_common_virt+0x1a4/0x1b0

[   38.841151] --- interrupt: 900 at copypage_power7+0xd4/0x1c0

During perf record session, perf_instruction_pointer() is called to
capture the sample ip. This function in core-book3s accesses 
ppmu->flags.

If a platform specific PMU driver is not registered, ppmu is set to NULL
and accessing its members results in a crash. Fix this crash by checking
if ppmu is set.

Signed-off-by: Athira Rajeev 
Reported-by: Christophe Leroy 


Fixes: 2ca13a4cc56c ("powerpc/perf: Use regs->nip when SIAR is zero")
Cc: sta...@vger.kernel.org
Tested-by: Christophe Leroy 
Thanks, but just wonder what is the system config and processor version 
in which you got this fail.
Reason, we do have generic-pmu which should kick-in in absence of a 
platform specific driver.


Maddy



---
  arch/powerpc/perf/core-book3s.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/perf/core-book3s.c 
b/arch/powerpc/perf/core-book3s.c

index 16d4d1b6a1ff..816756588cb7 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2254,7 +2254,7 @@ unsigned long perf_instruction_pointer(struct 
pt_regs *regs)

  bool use_siar = regs_use_siar(regs);
  unsigned long siar = mfspr(SPRN_SIAR);
  -    if (ppmu->flags & PPMU_P10_DD1) {
+    if (ppmu && ppmu->flags & PPMU_P10_DD1) {
  if (siar)
  return siar;
  else

Re: [PATCH v11 00/12] Restricted DMA

2021-06-16 Thread Claire Chang

v12: https://lore.kernel.org/patchwork/cover/1447254/

On Wed, Jun 16, 2021 at 11:52 AM Claire Chang  wrote:
>
> This series implements mitigations for lack of DMA access control on
> systems without an IOMMU, which could result in the DMA accessing the
> system memory at unexpected times and/or unexpected addresses, possibly
> leading to data leakage or corruption.
>
> For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
> not behind an IOMMU. As PCI-e, by design, gives the device full access to
> system memory, a vulnerability in the Wi-Fi firmware could easily escalate
> to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
> full chain of exploits; [2], [3]).
>
> To mitigate the security concerns, we introduce restricted DMA. Restricted
> DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a
> specially allocated region and does memory allocation from the same region.
> The feature on its own provides a basic level of protection against the DMA
> overwriting buffer contents at unexpected times. However, to protect
> against general data leakage and system memory corruption, the system needs
> to provide a way to restrict the DMA to a predefined memory region (this is
> usually done at firmware level, e.g. MPU in ATF on some ARM platforms [4]).
>
> [1a] 
> https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html
> [1b] 
> https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html
> [2] https://blade.tencent.com/en/advisories/qualpwn/
> [3] 
> https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/
> [4] 
> https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132
>
> v11:
> - Rebase against swiotlb devel/for-linus-5.14
> - s/mempry/memory/g
> - exchange the order of patch 09/12 and 10/12
> https://lore.kernel.org/patchwork/cover/1446882/
>
> v10:
> Address the comments in v9 to
>   - fix the dev->dma_io_tlb_mem assignment
>   - propagate swiotlb_force setting into io_tlb_default_mem->force
>   - move set_memory_decrypted out of swiotlb_init_io_tlb_mem
>   - move debugfs_dir declaration into the main CONFIG_DEBUG_FS block
>   - add swiotlb_ prefix to find_slots and release_slots
>   - merge the 3 alloc/free related patches
>   - move the CONFIG_DMA_RESTRICTED_POOL later
>
> v9:
> Address the comments in v7 to
>   - set swiotlb active pool to dev->dma_io_tlb_mem
>   - get rid of get_io_tlb_mem
>   - dig out the device struct for is_swiotlb_active
>   - move debugfs_create_dir out of swiotlb_create_debugfs
>   - do set_memory_decrypted conditionally in swiotlb_init_io_tlb_mem
>   - use IS_ENABLED in kernel/dma/direct.c
>   - fix redefinition of 'of_dma_set_restricted_buffer'
> https://lore.kernel.org/patchwork/cover/1445081/
>
> v8:
> - Fix reserved-memory.txt and add the reg property in example.
> - Fix sizeof for of_property_count_elems_of_size in
>   drivers/of/address.c#of_dma_set_restricted_buffer.
> - Apply Will's suggestion to try the OF node having DMA configuration in
>   drivers/of/address.c#of_dma_set_restricted_buffer.
> - Fix typo in the comment of 
> drivers/of/address.c#of_dma_set_restricted_buffer.
> - Add error message for PageHighMem in
>   kernel/dma/swiotlb.c#rmem_swiotlb_device_init and move it to
>   rmem_swiotlb_setup.
> - Fix the message string in rmem_swiotlb_setup.
> https://lore.kernel.org/patchwork/cover/1437112/
>
> v7:
> Fix debugfs, PageHighMem and comment style in rmem_swiotlb_device_init
> https://lore.kernel.org/patchwork/cover/1431031/
>
> v6:
> Address the comments in v5
> https://lore.kernel.org/patchwork/cover/1423201/
>
> v5:
> Rebase on latest linux-next
> https://lore.kernel.org/patchwork/cover/1416899/
>
> v4:
> - Fix spinlock bad magic
> - Use rmem->name for debugfs entry
> - Address the comments in v3
> https://lore.kernel.org/patchwork/cover/1378113/
>
> v3:
> Using only one reserved memory region for both streaming DMA and memory
> allocation.
> https://lore.kernel.org/patchwork/cover/1360992/
>
> v2:
> Building on top of swiotlb.
> https://lore.kernel.org/patchwork/cover/1280705/
>
> v1:
> Using dma_map_ops.
> https://lore.kernel.org/patchwork/cover/1271660/
>
> Claire Chang (12):
>   swiotlb: Refactor swiotlb init functions
>   swiotlb: Refactor swiotlb_create_debugfs
>   swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used
>   swiotlb: Update is_swiotlb_buffer to add a struct device argument
>   swiotlb: Update is_swiotlb_active to add a struct device argument
>   swiotlb: Use is_dev_swiotlb_force for swiotlb data bouncing
>   swiotlb: Move alloc_size to swiotlb_find_slots
>   swiotlb: Refactor swiotlb_tbl_unmap_single
>   swiotlb: Add restricted DMA alloc/free support
>   swiotlb: Add restricted DMA pool initialization
>   dt-bindings: of: Add restricted DMA pool
>   of: Add plumbing for restricted DMA pool
>
>

Re: Oops (NULL pointer) with 'perf record' of selftest 'null_syscall'

2021-06-16 Thread Christophe Leroy





Le 16/06/2021 à 05:40, Athira Rajeev a écrit :




On 16-Jun-2021, at 8:53 AM, Madhavan Srinivasan  wrote:


On 6/15/21 8:35 PM, Christophe Leroy wrote:

For your information, I'm getting the following Oops. Detected with 5.13-rc6, 
it also oopses on 5.12 and 5.11.
Runs ok on 5.10. I'm starting bisecting now.



Thanks for reporting, got the issue. What has happened in this case is that, 
pmu device is not registered
and trying to access the instruction point which will land in 
perf_instruction_pointer(). And recently I have added
a workaround patch for power10 DD1 which has caused this breakage. My bad. We 
are working on a fix patch
for the same and will post it out. Sorry again.



Hi Christophe,

Can you please try with below patch in your environment and test if it works 
for you.

 From 55d3afc9369dfbe28a7152c8e9f856c11c7fe43d Mon Sep 17 00:00:00 2001
From: Athira Rajeev 
Date: Tue, 15 Jun 2021 22:28:11 -0400
Subject: [PATCH] powerpc/perf: Fix crash with 'perf_instruction_pointer' when
  pmu is not set

On systems without any specific PMU driver support registered, running
perf record causes oops:

[   38.841073] NIP [c013af54] perf_instruction_pointer+0x24/0x100
[   38.841079] LR [c03c7358] perf_prepare_sample+0x4e8/0x820
[   38.841085] --- interrupt: 300
[   38.841088] [c0001cf03440] [c03c6ef8] 
perf_prepare_sample+0x88/0x820 (unreliable)
[   38.841096] [c0001cf034a0] [c03c76d0] 
perf_event_output_forward+0x40/0xc0
[   38.841104] [c0001cf03520] [c03b45e8] 
__perf_event_overflow+0x88/0x1b0
[   38.841112] [c0001cf03570] [c03b480c] 
perf_swevent_hrtimer+0xfc/0x1a0
[   38.841119] [c0001cf03740] [c02399cc] 
__hrtimer_run_queues+0x17c/0x380
[   38.841127] [c0001cf037c0] [c023a5f8] 
hrtimer_interrupt+0x128/0x2f0
[   38.841135] [c0001cf03870] [c002962c] timer_interrupt+0x13c/0x370
[   38.841143i] [c0001cf038d0] [c0009ba4] 
decrementer_common_virt+0x1a4/0x1b0
[   38.841151] --- interrupt: 900 at copypage_power7+0xd4/0x1c0

During perf record session, perf_instruction_pointer() is called to
capture the sample ip. This function in core-book3s accesses ppmu->flags.
If a platform specific PMU driver is not registered, ppmu is set to NULL
and accessing its members results in a crash. Fix this crash by checking
if ppmu is set.

Signed-off-by: Athira Rajeev 
Reported-by: Christophe Leroy 


Fixes: 2ca13a4cc56c ("powerpc/perf: Use regs->nip when SIAR is zero")
Cc: sta...@vger.kernel.org
Tested-by: Christophe Leroy 


---
  arch/powerpc/perf/core-book3s.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 16d4d1b6a1ff..816756588cb7 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2254,7 +2254,7 @@ unsigned long perf_instruction_pointer(struct pt_regs 
*regs)
bool use_siar = regs_use_siar(regs);
unsigned long siar = mfspr(SPRN_SIAR);
  
-	if (ppmu->flags & PPMU_P10_DD1) {

+   if (ppmu && ppmu->flags & PPMU_P10_DD1) {
if (siar)
return siar;
else

[PATCH v12 12/12] of: Add plumbing for restricted DMA pool

2021-06-16 Thread Claire Chang

If a device is not behind an IOMMU, we look up the device node and set
up the restricted DMA when the restricted-dma-pool is presented.

Signed-off-by: Claire Chang 
---
 drivers/of/address.c| 33 +
 drivers/of/device.c |  3 +++
 drivers/of/of_private.h |  6 ++
 3 files changed, 42 insertions(+)

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 73ddf2540f3f..cdf700fba5c4 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1022,6 +1023,38 @@ int of_dma_get_range(struct device_node *np, const 
struct bus_dma_region **map)
of_node_put(node);
return ret;
 }
+
+int of_dma_set_restricted_buffer(struct device *dev, struct device_node *np)
+{
+   struct device_node *node, *of_node = dev->of_node;
+   int count, i;
+
+   count = of_property_count_elems_of_size(of_node, "memory-region",
+   sizeof(u32));
+   /*
+* If dev->of_node doesn't exist or doesn't contain memory-region, try
+* the OF node having DMA configuration.
+*/
+   if (count <= 0) {
+   of_node = np;
+   count = of_property_count_elems_of_size(
+   of_node, "memory-region", sizeof(u32));
+   }
+
+   for (i = 0; i < count; i++) {
+   node = of_parse_phandle(of_node, "memory-region", i);
+   /*
+* There might be multiple memory regions, but only one
+* restricted-dma-pool region is allowed.
+*/
+   if (of_device_is_compatible(node, "restricted-dma-pool") &&
+   of_device_is_available(node))
+   return of_reserved_mem_device_init_by_idx(dev, of_node,
+ i);
+   }
+
+   return 0;
+}
 #endif /* CONFIG_HAS_DMA */
 
 /**
diff --git a/drivers/of/device.c b/drivers/of/device.c
index 6cb86de404f1..e68316836a7a 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -165,6 +165,9 @@ int of_dma_configure_id(struct device *dev, struct 
device_node *np,
 
arch_setup_dma_ops(dev, dma_start, size, iommu, coherent);
 
+   if (!iommu)
+   return of_dma_set_restricted_buffer(dev, np);
+
return 0;
 }
 EXPORT_SYMBOL_GPL(of_dma_configure_id);
diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h
index d9e6a324de0a..25cebbed5f02 100644
--- a/drivers/of/of_private.h
+++ b/drivers/of/of_private.h
@@ -161,12 +161,18 @@ struct bus_dma_region;
 #if defined(CONFIG_OF_ADDRESS) && defined(CONFIG_HAS_DMA)
 int of_dma_get_range(struct device_node *np,
const struct bus_dma_region **map);
+int of_dma_set_restricted_buffer(struct device *dev, struct device_node *np);
 #else
 static inline int of_dma_get_range(struct device_node *np,
const struct bus_dma_region **map)
 {
return -ENODEV;
 }
+static inline int of_dma_set_restricted_buffer(struct device *dev,
+  struct device_node *np)
+{
+   return -ENODEV;
+}
 #endif
 
 #endif /* _LINUX_OF_PRIVATE_H */
-- 
2.32.0.272.g935e593368-goog

[PATCH v12 11/12] dt-bindings: of: Add restricted DMA pool

2021-06-16 Thread Claire Chang

Introduce the new compatible string, restricted-dma-pool, for restricted
DMA. One can specify the address and length of the restricted DMA memory
region by restricted-dma-pool in the reserved-memory node.

Signed-off-by: Claire Chang 
---
 .../reserved-memory/reserved-memory.txt   | 36 +--
 1 file changed, 33 insertions(+), 3 deletions(-)

diff --git 
a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt 
b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
index e8d3096d922c..46804f24df05 100644
--- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
+++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
@@ -51,6 +51,23 @@ compatible (optional) - standard definition
   used as a shared pool of DMA buffers for a set of devices. It can
   be used by an operating system to instantiate the necessary pool
   management subsystem if necessary.
+- restricted-dma-pool: This indicates a region of memory meant to be
+  used as a pool of restricted DMA buffers for a set of devices. The
+  memory region would be the only region accessible to those devices.
+  When using this, the no-map and reusable properties must not be set,
+  so the operating system can create a virtual mapping that will be 
used
+  for synchronization. The main purpose for restricted DMA is to
+  mitigate the lack of DMA access control on systems without an IOMMU,
+  which could result in the DMA accessing the system memory at
+  unexpected times and/or unexpected addresses, possibly leading to 
data
+  leakage or corruption. The feature on its own provides a basic level
+  of protection against the DMA overwriting buffer contents at
+  unexpected times. However, to protect against general data leakage 
and
+  system memory corruption, the system needs to provide way to lock 
down
+  the memory access, e.g., MPU. Note that since coherent allocation
+  needs remapping, one must set up another device coherent pool by
+  shared-dma-pool and use dma_alloc_from_dev_coherent instead for 
atomic
+  coherent allocation.
 - vendor specific string in the form ,[-]
 no-map (optional) - empty property
 - Indicates the operating system must not create a virtual mapping
@@ -85,10 +102,11 @@ memory-region-names (optional) - a list of names, one for 
each corresponding
 
 Example
 ---
-This example defines 3 contiguous regions are defined for Linux kernel:
+This example defines 4 contiguous regions for Linux kernel:
 one default of all device drivers (named linux,cma@7200 and 64MiB in size),
-one dedicated to the framebuffer device (named framebuffer@7800, 8MiB), and
-one for multimedia processing (named multimedia-memory@7700, 64MiB).
+one dedicated to the framebuffer device (named framebuffer@7800, 8MiB),
+one for multimedia processing (named multimedia-memory@7700, 64MiB), and
+one for restricted dma pool (named restricted_dma_reserved@0x5000, 64MiB).
 
 / {
#address-cells = <1>;
@@ -120,6 +138,11 @@ one for multimedia processing (named 
multimedia-memory@7700, 64MiB).
compatible = "acme,multimedia-memory";
reg = <0x7700 0x400>;
};
+
+   restricted_dma_reserved: restricted_dma_reserved {
+   compatible = "restricted-dma-pool";
+   reg = <0x5000 0x400>;
+   };
};
 
/* ... */
@@ -138,4 +161,11 @@ one for multimedia processing (named 
multimedia-memory@7700, 64MiB).
memory-region = <_reserved>;
/* ... */
};
+
+   pcie_device: pcie_device@0,0 {
+   reg = <0x8301 0x0 0x 0x0 0x0010
+  0x8301 0x0 0x0010 0x0 0x0010>;
+   memory-region = <_dma_mem_reserved>;
+   /* ... */
+   };
 };
-- 
2.32.0.272.g935e593368-goog

[PATCH v12 10/12] swiotlb: Add restricted DMA pool initialization

2021-06-16 Thread Claire Chang

Add the initialization function to create restricted DMA pools from
matching reserved-memory nodes.

Regardless of swiotlb setting, the restricted DMA pool is preferred if
available.

The restricted DMA pools provide a basic level of protection against the
DMA overwriting buffer contents at unexpected times. However, to protect
against general data leakage and system memory corruption, the system
needs to provide a way to lock down the memory access, e.g., MPU.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
---
 include/linux/swiotlb.h |  3 +-
 kernel/dma/Kconfig  | 14 
 kernel/dma/swiotlb.c| 76 +
 3 files changed, 92 insertions(+), 1 deletion(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index a73fad460162..175b6c113ed8 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -73,7 +73,8 @@ extern enum swiotlb_force swiotlb_force;
  * range check to see if the memory was in fact allocated by this
  * API.
  * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
- * @end. This is command line adjustable via setup_io_tlb_npages.
+ * @end. For default swiotlb, this is command line adjustable via
+ * setup_io_tlb_npages.
  * @used:  The number of used IO TLB block.
  * @list:  The free list describing the number of free entries available
  * from each index.
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 77b405508743..3e961dc39634 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -80,6 +80,20 @@ config SWIOTLB
bool
select NEED_DMA_MAP_STATE
 
+config DMA_RESTRICTED_POOL
+   bool "DMA Restricted Pool"
+   depends on OF && OF_RESERVED_MEM
+   select SWIOTLB
+   help
+ This enables support for restricted DMA pools which provide a level of
+ DMA memory protection on systems with limited hardware protection
+ capabilities, such as those lacking an IOMMU.
+
+ For more information see
+ 

+ and .
+ If unsure, say "n".
+
 #
 # Should be selected if we can mmap non-coherent mappings to userspace.
 # The only thing that is really required is a way to set an uncached bit
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index d3d4f1a25fee..8a4d4ad4335e 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -39,6 +39,13 @@
 #ifdef CONFIG_DEBUG_FS
 #include 
 #endif
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+#include 
+#include 
+#include 
+#include 
+#include 
+#endif
 
 #include 
 #include 
@@ -735,4 +742,73 @@ bool swiotlb_free(struct device *dev, struct page *page, 
size_t size)
return true;
 }
 
+static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
+   struct device *dev)
+{
+   struct io_tlb_mem *mem = rmem->priv;
+   unsigned long nslabs = rmem->size >> IO_TLB_SHIFT;
+
+   /*
+* Since multiple devices can share the same pool, the private data,
+* io_tlb_mem struct, will be initialized by the first device attached
+* to it.
+*/
+   if (!mem) {
+   mem = kzalloc(struct_size(mem, slots, nslabs), GFP_KERNEL);
+   if (!mem)
+   return -ENOMEM;
+
+   swiotlb_init_io_tlb_mem(mem, rmem->base, nslabs, false);
+   mem->force_bounce = true;
+   mem->for_alloc = true;
+   set_memory_decrypted((unsigned long)phys_to_virt(rmem->base),
+rmem->size >> PAGE_SHIFT);
+
+   rmem->priv = mem;
+
+   if (IS_ENABLED(CONFIG_DEBUG_FS)) {
+   mem->debugfs =
+   debugfs_create_dir(rmem->name, debugfs_dir);
+   swiotlb_create_debugfs_files(mem);
+   }
+   }
+
+   dev->dma_io_tlb_mem = mem;
+
+   return 0;
+}
+
+static void rmem_swiotlb_device_release(struct reserved_mem *rmem,
+   struct device *dev)
+{
+   dev->dma_io_tlb_mem = io_tlb_default_mem;
+}
+
+static const struct reserved_mem_ops rmem_swiotlb_ops = {
+   .device_init = rmem_swiotlb_device_init,
+   .device_release = rmem_swiotlb_device_release,
+};
+
+static int __init rmem_swiotlb_setup(struct reserved_mem *rmem)
+{
+   unsigned long node = rmem->fdt_node;
+
+   if (of_get_flat_dt_prop(node, "reusable", NULL) ||
+   of_get_flat_dt_prop(node, "linux,cma-default", NULL) ||
+   of_get_flat_dt_prop(node, "linux,dma-default", NULL) ||
+   of_get_flat_dt_prop(node, "no-map", NULL))
+   return -EINVAL;
+
+   if (PageHighMem(pfn_to_page(PHYS_PFN(rmem->base {
+   pr_err("Restricted DMA pool must be accessible within the 
linear mapping.");
+   return -EINVAL;
+   }
+
+   rmem->ops = _swiotlb_ops;
+

[PATCH v12 09/12] swiotlb: Add restricted DMA alloc/free support

2021-06-16 Thread Claire Chang

Add the functions, swiotlb_{alloc,free} and is_swiotlb_for_alloc to
support the memory allocation from restricted DMA pool.

The restricted DMA pool is preferred if available.

Note that since coherent allocation needs remapping, one must set up
another device coherent pool by shared-dma-pool and use
dma_alloc_from_dev_coherent instead for atomic coherent allocation.

Signed-off-by: Claire Chang 
---
 include/linux/swiotlb.h | 26 ++
 kernel/dma/direct.c | 49 +++--
 kernel/dma/swiotlb.c| 38 ++--
 3 files changed, 99 insertions(+), 14 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 8d8855c77d9a..a73fad460162 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -85,6 +85,7 @@ extern enum swiotlb_force swiotlb_force;
  * @debugfs:   The dentry to debugfs.
  * @late_alloc:%true if allocated using the page allocator
  * @force_bounce: %true if swiotlb bouncing is forced
+ * @for_alloc:  %true if the pool is used for memory allocation
  */
 struct io_tlb_mem {
phys_addr_t start;
@@ -96,6 +97,7 @@ struct io_tlb_mem {
struct dentry *debugfs;
bool late_alloc;
bool force_bounce;
+   bool for_alloc;
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
@@ -156,4 +158,28 @@ static inline void swiotlb_adjust_size(unsigned long size)
 extern void swiotlb_print_info(void);
 extern void swiotlb_set_max_segment(unsigned int);
 
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+struct page *swiotlb_alloc(struct device *dev, size_t size);
+bool swiotlb_free(struct device *dev, struct page *page, size_t size);
+
+static inline bool is_swiotlb_for_alloc(struct device *dev)
+{
+   return dev->dma_io_tlb_mem->for_alloc;
+}
+#else
+static inline struct page *swiotlb_alloc(struct device *dev, size_t size)
+{
+   return NULL;
+}
+static inline bool swiotlb_free(struct device *dev, struct page *page,
+   size_t size)
+{
+   return false;
+}
+static inline bool is_swiotlb_for_alloc(struct device *dev)
+{
+   return false;
+}
+#endif /* CONFIG_DMA_RESTRICTED_POOL */
+
 #endif /* __LINUX_SWIOTLB_H */
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index a92465b4eb12..2de33e5d302b 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -75,6 +75,15 @@ static bool dma_coherent_ok(struct device *dev, phys_addr_t 
phys, size_t size)
min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit);
 }
 
+static void __dma_direct_free_pages(struct device *dev, struct page *page,
+   size_t size)
+{
+   if (IS_ENABLED(CONFIG_DMA_RESTRICTED_POOL) &&
+   swiotlb_free(dev, page, size))
+   return;
+   dma_free_contiguous(dev, page, size);
+}
+
 static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
gfp_t gfp)
 {
@@ -86,6 +95,16 @@ static struct page *__dma_direct_alloc_pages(struct device 
*dev, size_t size,
 
gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
   _limit);
+   if (IS_ENABLED(CONFIG_DMA_RESTRICTED_POOL) &&
+   is_swiotlb_for_alloc(dev)) {
+   page = swiotlb_alloc(dev, size);
+   if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
+   __dma_direct_free_pages(dev, page, size);
+   return NULL;
+   }
+   return page;
+   }
+
page = dma_alloc_contiguous(dev, size, gfp);
if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
dma_free_contiguous(dev, page, size);
@@ -142,7 +161,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
gfp |= __GFP_NOWARN;
 
if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
-   !force_dma_unencrypted(dev)) {
+   !force_dma_unencrypted(dev) && !is_swiotlb_for_alloc(dev)) {
page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO);
if (!page)
return NULL;
@@ -155,18 +174,23 @@ void *dma_direct_alloc(struct device *dev, size_t size,
}
 
if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
-   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-   !dev_is_dma_coherent(dev))
+   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !dev_is_dma_coherent(dev) &&
+   !is_swiotlb_for_alloc(dev))
return arch_dma_alloc(dev, size, dma_handle, gfp, attrs);
 
/*
 * Remapping or decrypting memory may block. If either is required and
 * we can't block, allocate the memory from the atomic pools.
+* If restricted DMA (i.e., is_swiotlb_for_alloc) is required, one must
+* set up another device coherent pool by shared-dma-pool and use
+*

[PATCH v12 08/12] swiotlb: Refactor swiotlb_tbl_unmap_single

2021-06-16 Thread Claire Chang

Add a new function, swiotlb_release_slots, to make the code reusable for
supporting different bounce buffer pools.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
---
 kernel/dma/swiotlb.c | 35 ---
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index b59e689aa79d..688c6e0c43ff 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -555,27 +555,15 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
return tlb_addr;
 }
 
-/*
- * tlb_addr is the physical address of the bounce buffer to unmap.
- */
-void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
- size_t mapping_size, enum dma_data_direction dir,
- unsigned long attrs)
+static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
 {
-   struct io_tlb_mem *mem = hwdev->dma_io_tlb_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned long flags;
-   unsigned int offset = swiotlb_align_offset(hwdev, tlb_addr);
+   unsigned int offset = swiotlb_align_offset(dev, tlb_addr);
int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
int nslots = nr_slots(mem->slots[index].alloc_size + offset);
int count, i;
 
-   /*
-* First, sync the memory before unmapping the entry
-*/
-   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
-   (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
-   swiotlb_bounce(hwdev, tlb_addr, mapping_size, DMA_FROM_DEVICE);
-
/*
 * Return the buffer to the free list by setting the corresponding
 * entries to indicate the number of contiguous entries available.
@@ -610,6 +598,23 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
phys_addr_t tlb_addr,
spin_unlock_irqrestore(>lock, flags);
 }
 
+/*
+ * tlb_addr is the physical address of the bounce buffer to unmap.
+ */
+void swiotlb_tbl_unmap_single(struct device *dev, phys_addr_t tlb_addr,
+ size_t mapping_size, enum dma_data_direction dir,
+ unsigned long attrs)
+{
+   /*
+* First, sync the memory before unmapping the entry
+*/
+   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
+   (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
+   swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_FROM_DEVICE);
+
+   swiotlb_release_slots(dev, tlb_addr);
+}
+
 void swiotlb_sync_single_for_device(struct device *dev, phys_addr_t tlb_addr,
size_t size, enum dma_data_direction dir)
 {
-- 
2.32.0.272.g935e593368-goog

[PATCH v12 07/12] swiotlb: Move alloc_size to swiotlb_find_slots

2021-06-16 Thread Claire Chang

Rename find_slots to swiotlb_find_slots and move the maintenance of
alloc_size to it for better code reusability later.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
---
 kernel/dma/swiotlb.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index b5a9c4c0b4db..b59e689aa79d 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -431,8 +431,8 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, 
unsigned int index)
  * Find a suitable number of IO TLB entries size that will fit this request and
  * allocate a buffer from that IO TLB pool.
  */
-static int find_slots(struct device *dev, phys_addr_t orig_addr,
-   size_t alloc_size)
+static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
+ size_t alloc_size)
 {
struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned long boundary_mask = dma_get_seg_boundary(dev);
@@ -487,8 +487,11 @@ static int find_slots(struct device *dev, phys_addr_t 
orig_addr,
return -1;
 
 found:
-   for (i = index; i < index + nslots; i++)
+   for (i = index; i < index + nslots; i++) {
mem->slots[i].list = 0;
+   mem->slots[i].alloc_size =
+   alloc_size - ((i - index) << IO_TLB_SHIFT);
+   }
for (i = index - 1;
 io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
 mem->slots[i].list; i--)
@@ -529,7 +532,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
return (phys_addr_t)DMA_MAPPING_ERROR;
}
 
-   index = find_slots(dev, orig_addr, alloc_size + offset);
+   index = swiotlb_find_slots(dev, orig_addr, alloc_size + offset);
if (index == -1) {
if (!(attrs & DMA_ATTR_NO_WARN))
dev_warn_ratelimited(dev,
@@ -543,11 +546,8 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
 * This is needed when we sync the memory.  Then we sync the buffer if
 * needed.
 */
-   for (i = 0; i < nr_slots(alloc_size + offset); i++) {
+   for (i = 0; i < nr_slots(alloc_size + offset); i++)
mem->slots[index + i].orig_addr = slot_addr(orig_addr, i);
-   mem->slots[index + i].alloc_size =
-   alloc_size - (i << IO_TLB_SHIFT);
-   }
tlb_addr = slot_addr(mem->start, index) + offset;
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
-- 
2.32.0.272.g935e593368-goog

[PATCH v12 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

2021-06-16 Thread Claire Chang

Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and
use it to determine whether to bounce the data or not. This will be
useful later to allow for different pools.

Signed-off-by: Claire Chang 
---
 include/linux/swiotlb.h | 11 +++
 kernel/dma/direct.c |  2 +-
 kernel/dma/direct.h |  2 +-
 kernel/dma/swiotlb.c|  4 
 4 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index dd1c30a83058..8d8855c77d9a 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -84,6 +84,7 @@ extern enum swiotlb_force swiotlb_force;
  * unmap calls.
  * @debugfs:   The dentry to debugfs.
  * @late_alloc:%true if allocated using the page allocator
+ * @force_bounce: %true if swiotlb bouncing is forced
  */
 struct io_tlb_mem {
phys_addr_t start;
@@ -94,6 +95,7 @@ struct io_tlb_mem {
spinlock_t lock;
struct dentry *debugfs;
bool late_alloc;
+   bool force_bounce;
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
@@ -109,6 +111,11 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
return mem && paddr >= mem->start && paddr < mem->end;
 }
 
+static inline bool is_swiotlb_force_bounce(struct device *dev)
+{
+   return dev->dma_io_tlb_mem->force_bounce;
+}
+
 void __init swiotlb_exit(void);
 unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
@@ -120,6 +127,10 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
 {
return false;
 }
+static inline bool is_swiotlb_force_bounce(struct device *dev)
+{
+   return false;
+}
 static inline void swiotlb_exit(void)
 {
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 7a88c34d0867..a92465b4eb12 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -496,7 +496,7 @@ size_t dma_direct_max_mapping_size(struct device *dev)
 {
/* If SWIOTLB is active, use its maximum mapping size */
if (is_swiotlb_active(dev) &&
-   (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))
+   (dma_addressing_limited(dev) || is_swiotlb_force_bounce(dev)))
return swiotlb_max_mapping_size(dev);
return SIZE_MAX;
 }
diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index 13e9e7158d94..4632b0f4f72e 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -87,7 +87,7 @@ static inline dma_addr_t dma_direct_map_page(struct device 
*dev,
phys_addr_t phys = page_to_phys(page) + offset;
dma_addr_t dma_addr = phys_to_dma(dev, phys);
 
-   if (unlikely(swiotlb_force == SWIOTLB_FORCE))
+   if (is_swiotlb_force_bounce(dev))
return swiotlb_map(dev, phys, size, dir, attrs);
 
if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 101abeb0a57d..b5a9c4c0b4db 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -179,6 +179,10 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
*mem, phys_addr_t start,
mem->end = mem->start + bytes;
mem->index = 0;
mem->late_alloc = late_alloc;
+
+   if (swiotlb_force == SWIOTLB_FORCE)
+   mem->force_bounce = true;
+
spin_lock_init(>lock);
for (i = 0; i < mem->nslabs; i++) {
mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
-- 
2.32.0.272.g935e593368-goog

[PATCH v12 05/12] swiotlb: Update is_swiotlb_active to add a struct device argument

2021-06-16 Thread Claire Chang

Update is_swiotlb_active to add a struct device argument. This will be
useful later to allow for different pools.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
---
 drivers/gpu/drm/i915/gem/i915_gem_internal.c | 2 +-
 drivers/gpu/drm/nouveau/nouveau_ttm.c| 2 +-
 drivers/pci/xen-pcifront.c   | 2 +-
 include/linux/swiotlb.h  | 4 ++--
 kernel/dma/direct.c  | 2 +-
 kernel/dma/swiotlb.c | 4 ++--
 6 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c 
b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
index a9d65fc8aa0e..4b7afa0fc85d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
@@ -42,7 +42,7 @@ static int i915_gem_object_get_pages_internal(struct 
drm_i915_gem_object *obj)
 
max_order = MAX_ORDER;
 #ifdef CONFIG_SWIOTLB
-   if (is_swiotlb_active()) {
+   if (is_swiotlb_active(obj->base.dev->dev)) {
unsigned int max_segment;
 
max_segment = swiotlb_max_segment();
diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c 
b/drivers/gpu/drm/nouveau/nouveau_ttm.c
index 9662522aa066..be15bfd9e0ee 100644
--- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
@@ -321,7 +321,7 @@ nouveau_ttm_init(struct nouveau_drm *drm)
}
 
 #if IS_ENABLED(CONFIG_SWIOTLB) && IS_ENABLED(CONFIG_X86)
-   need_swiotlb = is_swiotlb_active();
+   need_swiotlb = is_swiotlb_active(dev->dev);
 #endif
 
ret = ttm_bo_device_init(>ttm.bdev, _bo_driver,
diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c
index b7a8f3a1921f..0d56985bfe81 100644
--- a/drivers/pci/xen-pcifront.c
+++ b/drivers/pci/xen-pcifront.c
@@ -693,7 +693,7 @@ static int pcifront_connect_and_init_dma(struct 
pcifront_device *pdev)
 
spin_unlock(_dev_lock);
 
-   if (!err && !is_swiotlb_active()) {
+   if (!err && !is_swiotlb_active(>xdev->dev)) {
err = pci_xen_swiotlb_init_late();
if (err)
dev_err(>xdev->dev, "Could not setup SWIOTLB!\n");
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index d1f3d95881cd..dd1c30a83058 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -112,7 +112,7 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
 void __init swiotlb_exit(void);
 unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
-bool is_swiotlb_active(void);
+bool is_swiotlb_active(struct device *dev);
 void __init swiotlb_adjust_size(unsigned long size);
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
@@ -132,7 +132,7 @@ static inline size_t swiotlb_max_mapping_size(struct device 
*dev)
return SIZE_MAX;
 }
 
-static inline bool is_swiotlb_active(void)
+static inline bool is_swiotlb_active(struct device *dev)
 {
return false;
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 84c9feb5474a..7a88c34d0867 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -495,7 +495,7 @@ int dma_direct_supported(struct device *dev, u64 mask)
 size_t dma_direct_max_mapping_size(struct device *dev)
 {
/* If SWIOTLB is active, use its maximum mapping size */
-   if (is_swiotlb_active() &&
+   if (is_swiotlb_active(dev) &&
(dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))
return swiotlb_max_mapping_size(dev);
return SIZE_MAX;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index a9f5c08dd94a..101abeb0a57d 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -663,9 +663,9 @@ size_t swiotlb_max_mapping_size(struct device *dev)
return ((size_t)IO_TLB_SIZE) * IO_TLB_SEGSIZE;
 }
 
-bool is_swiotlb_active(void)
+bool is_swiotlb_active(struct device *dev)
 {
-   return io_tlb_default_mem != NULL;
+   return dev->dma_io_tlb_mem != NULL;
 }
 EXPORT_SYMBOL_GPL(is_swiotlb_active);
 
-- 
2.32.0.272.g935e593368-goog

[PATCH v12 04/12] swiotlb: Update is_swiotlb_buffer to add a struct device argument

2021-06-16 Thread Claire Chang

Update is_swiotlb_buffer to add a struct device argument. This will be
useful later to allow for different pools.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
---
 drivers/iommu/dma-iommu.c | 12 ++--
 drivers/xen/swiotlb-xen.c |  2 +-
 include/linux/swiotlb.h   |  7 ---
 kernel/dma/direct.c   |  6 +++---
 kernel/dma/direct.h   |  6 +++---
 5 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 3087d9fa6065..10997ef541f8 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -507,7 +507,7 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, 
dma_addr_t dma_addr,
 
__iommu_dma_unmap(dev, dma_addr, size);
 
-   if (unlikely(is_swiotlb_buffer(phys)))
+   if (unlikely(is_swiotlb_buffer(dev, phys)))
swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
 }
 
@@ -578,7 +578,7 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device 
*dev, phys_addr_t phys,
}
 
iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
-   if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
+   if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(dev, phys))
swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs);
return iova;
 }
@@ -749,7 +749,7 @@ static void iommu_dma_sync_single_for_cpu(struct device 
*dev,
if (!dev_is_dma_coherent(dev))
arch_sync_dma_for_cpu(phys, size, dir);
 
-   if (is_swiotlb_buffer(phys))
+   if (is_swiotlb_buffer(dev, phys))
swiotlb_sync_single_for_cpu(dev, phys, size, dir);
 }
 
@@ -762,7 +762,7 @@ static void iommu_dma_sync_single_for_device(struct device 
*dev,
return;
 
phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
-   if (is_swiotlb_buffer(phys))
+   if (is_swiotlb_buffer(dev, phys))
swiotlb_sync_single_for_device(dev, phys, size, dir);
 
if (!dev_is_dma_coherent(dev))
@@ -783,7 +783,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
if (!dev_is_dma_coherent(dev))
arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
 
-   if (is_swiotlb_buffer(sg_phys(sg)))
+   if (is_swiotlb_buffer(dev, sg_phys(sg)))
swiotlb_sync_single_for_cpu(dev, sg_phys(sg),
sg->length, dir);
}
@@ -800,7 +800,7 @@ static void iommu_dma_sync_sg_for_device(struct device *dev,
return;
 
for_each_sg(sgl, sg, nelems, i) {
-   if (is_swiotlb_buffer(sg_phys(sg)))
+   if (is_swiotlb_buffer(dev, sg_phys(sg)))
swiotlb_sync_single_for_device(dev, sg_phys(sg),
   sg->length, dir);
 
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 4c89afc0df62..0c6ed09f8513 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -100,7 +100,7 @@ static int is_xen_swiotlb_buffer(struct device *dev, 
dma_addr_t dma_addr)
 * in our domain. Therefore _only_ check address within our domain.
 */
if (pfn_valid(PFN_DOWN(paddr)))
-   return is_swiotlb_buffer(paddr);
+   return is_swiotlb_buffer(dev, paddr);
return 0;
 }
 
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 216854a5e513..d1f3d95881cd 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -2,6 +2,7 @@
 #ifndef __LINUX_SWIOTLB_H
 #define __LINUX_SWIOTLB_H
 
+#include 
 #include 
 #include 
 #include 
@@ -101,9 +102,9 @@ struct io_tlb_mem {
 };
 extern struct io_tlb_mem *io_tlb_default_mem;
 
-static inline bool is_swiotlb_buffer(phys_addr_t paddr)
+static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
 
return mem && paddr >= mem->start && paddr < mem->end;
 }
@@ -115,7 +116,7 @@ bool is_swiotlb_active(void);
 void __init swiotlb_adjust_size(unsigned long size);
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
-static inline bool is_swiotlb_buffer(phys_addr_t paddr)
+static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
return false;
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index f737e3347059..84c9feb5474a 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -343,7 +343,7 @@ void dma_direct_sync_sg_for_device(struct device *dev,
for_each_sg(sgl, sg, nents, i) {
phys_addr_t paddr = dma_to_phys(dev, sg_dma_address(sg));
 
-   if (unlikely(is_swiotlb_buffer(paddr)))
+   if (unlikely(is_swiotlb_buffer(dev, paddr)))
swiotlb_sync_single_for_device(dev, paddr, sg->length,

[PATCH v12 03/12] swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used

2021-06-16 Thread Claire Chang

Always have the pointer to the swiotlb pool used in struct device. This
could help simplify the code for other pools.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
---
 drivers/base/core.c| 4 
 include/linux/device.h | 4 
 kernel/dma/swiotlb.c   | 8 
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index f29839382f81..cb3123e3954d 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include  /* for dma_default_coherent */
 
@@ -2736,6 +2737,9 @@ void device_initialize(struct device *dev)
 defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
dev->dma_coherent = dma_default_coherent;
 #endif
+#ifdef CONFIG_SWIOTLB
+   dev->dma_io_tlb_mem = io_tlb_default_mem;
+#endif
 }
 EXPORT_SYMBOL_GPL(device_initialize);
 
diff --git a/include/linux/device.h b/include/linux/device.h
index ba660731bd25..240d652a0696 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -416,6 +416,7 @@ struct dev_links_info {
  * @dma_pools: Dma pools (if dma'ble device).
  * @dma_mem:   Internal for coherent mem override.
  * @cma_area:  Contiguous memory area for dma allocations
+ * @dma_io_tlb_mem: Pointer to the swiotlb pool used.  Not for driver use.
  * @archdata:  For arch-specific additions.
  * @of_node:   Associated device tree node.
  * @fwnode:Associated device node supplied by platform firmware.
@@ -518,6 +519,9 @@ struct device {
 #ifdef CONFIG_DMA_CMA
struct cma *cma_area;   /* contiguous memory area for dma
   allocations */
+#endif
+#ifdef CONFIG_SWIOTLB
+   struct io_tlb_mem *dma_io_tlb_mem;
 #endif
/* arch specific additions */
struct dev_archdata archdata;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index af416bcd1914..a9f5c08dd94a 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -339,7 +339,7 @@ void __init swiotlb_exit(void)
 static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t 
size,
   enum dma_data_direction dir)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
unsigned int offset = (tlb_addr - mem->start) & (IO_TLB_SIZE - 1);
phys_addr_t orig_addr = mem->slots[index].orig_addr;
@@ -430,7 +430,7 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, 
unsigned int index)
 static int find_slots(struct device *dev, phys_addr_t orig_addr,
size_t alloc_size)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned long boundary_mask = dma_get_seg_boundary(dev);
dma_addr_t tbl_dma_addr =
phys_to_dma_unencrypted(dev, mem->start) & boundary_mask;
@@ -507,7 +507,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
size_t mapping_size, size_t alloc_size,
enum dma_data_direction dir, unsigned long attrs)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned int offset = swiotlb_align_offset(dev, orig_addr);
unsigned int i;
int index;
@@ -558,7 +558,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
phys_addr_t tlb_addr,
  size_t mapping_size, enum dma_data_direction dir,
  unsigned long attrs)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = hwdev->dma_io_tlb_mem;
unsigned long flags;
unsigned int offset = swiotlb_align_offset(hwdev, tlb_addr);
int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
-- 
2.32.0.272.g935e593368-goog

[PATCH v12 02/12] swiotlb: Refactor swiotlb_create_debugfs

2021-06-16 Thread Claire Chang

Split the debugfs creation to make the code reusable for supporting
different bounce buffer pools.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
---
 kernel/dma/swiotlb.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 3ba0f08a39a1..af416bcd1914 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -670,19 +670,26 @@ bool is_swiotlb_active(void)
 EXPORT_SYMBOL_GPL(is_swiotlb_active);
 
 #ifdef CONFIG_DEBUG_FS
+static struct dentry *debugfs_dir;
 
-static int __init swiotlb_create_debugfs(void)
+static void swiotlb_create_debugfs_files(struct io_tlb_mem *mem)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
-
-   if (!mem)
-   return 0;
-   mem->debugfs = debugfs_create_dir("swiotlb", NULL);
debugfs_create_ulong("io_tlb_nslabs", 0400, mem->debugfs, >nslabs);
debugfs_create_ulong("io_tlb_used", 0400, mem->debugfs, >used);
+}
+
+static int __init swiotlb_create_default_debugfs(void)
+{
+   struct io_tlb_mem *mem = io_tlb_default_mem;
+
+   debugfs_dir = debugfs_create_dir("swiotlb", NULL);
+   if (mem) {
+   mem->debugfs = debugfs_dir;
+   swiotlb_create_debugfs_files(mem);
+   }
return 0;
 }
 
-late_initcall(swiotlb_create_debugfs);
+late_initcall(swiotlb_create_default_debugfs);
 
 #endif
-- 
2.32.0.272.g935e593368-goog

[PATCH v12 01/12] swiotlb: Refactor swiotlb init functions

2021-06-16 Thread Claire Chang

Add a new function, swiotlb_init_io_tlb_mem, for the io_tlb_mem struct
initialization to make the code reusable.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
---
 kernel/dma/swiotlb.c | 49 ++--
 1 file changed, 24 insertions(+), 25 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 52e2ac526757..3ba0f08a39a1 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -168,9 +168,28 @@ void __init swiotlb_update_mem_attributes(void)
memset(vaddr, 0, bytes);
 }
 
-int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
+static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
+   unsigned long nslabs, bool late_alloc)
 {
+   void *vaddr = phys_to_virt(start);
unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
+
+   mem->nslabs = nslabs;
+   mem->start = start;
+   mem->end = mem->start + bytes;
+   mem->index = 0;
+   mem->late_alloc = late_alloc;
+   spin_lock_init(>lock);
+   for (i = 0; i < mem->nslabs; i++) {
+   mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
+   mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
+   mem->slots[i].alloc_size = 0;
+   }
+   memset(vaddr, 0, bytes);
+}
+
+int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
+{
struct io_tlb_mem *mem;
size_t alloc_size;
 
@@ -186,16 +205,8 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long 
nslabs, int verbose)
if (!mem)
panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
  __func__, alloc_size, PAGE_SIZE);
-   mem->nslabs = nslabs;
-   mem->start = __pa(tlb);
-   mem->end = mem->start + bytes;
-   mem->index = 0;
-   spin_lock_init(>lock);
-   for (i = 0; i < mem->nslabs; i++) {
-   mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
-   mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
-   mem->slots[i].alloc_size = 0;
-   }
+
+   swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
 
io_tlb_default_mem = mem;
if (verbose)
@@ -282,8 +293,8 @@ swiotlb_late_init_with_default_size(size_t default_size)
 int
 swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
 {
-   unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
struct io_tlb_mem *mem;
+   unsigned long bytes = nslabs << IO_TLB_SHIFT;
 
if (swiotlb_force == SWIOTLB_NO_FORCE)
return 0;
@@ -297,20 +308,8 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
if (!mem)
return -ENOMEM;
 
-   mem->nslabs = nslabs;
-   mem->start = virt_to_phys(tlb);
-   mem->end = mem->start + bytes;
-   mem->index = 0;
-   mem->late_alloc = 1;
-   spin_lock_init(>lock);
-   for (i = 0; i < mem->nslabs; i++) {
-   mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
-   mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
-   mem->slots[i].alloc_size = 0;
-   }
-
+   swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true);
set_memory_decrypted((unsigned long)tlb, bytes >> PAGE_SHIFT);
-   memset(tlb, 0, bytes);
 
io_tlb_default_mem = mem;
swiotlb_print_info();
-- 
2.32.0.272.g935e593368-goog

[PATCH v12 00/12] Restricted DMA

2021-06-16 Thread Claire Chang

This series implements mitigations for lack of DMA access control on
systems without an IOMMU, which could result in the DMA accessing the
system memory at unexpected times and/or unexpected addresses, possibly
leading to data leakage or corruption.

For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
not behind an IOMMU. As PCI-e, by design, gives the device full access to
system memory, a vulnerability in the Wi-Fi firmware could easily escalate
to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
full chain of exploits; [2], [3]).

To mitigate the security concerns, we introduce restricted DMA. Restricted
DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a
specially allocated region and does memory allocation from the same region.
The feature on its own provides a basic level of protection against the DMA
overwriting buffer contents at unexpected times. However, to protect
against general data leakage and system memory corruption, the system needs
to provide a way to restrict the DMA to a predefined memory region (this is
usually done at firmware level, e.g. MPU in ATF on some ARM platforms [4]).

[1a] 
https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html
[1b] 
https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html
[2] https://blade.tencent.com/en/advisories/qualpwn/
[3] 
https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/
[4] 
https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132

v12:
Split is_dev_swiotlb_force into is_swiotlb_force_bounce (patch 06/12) and
is_swiotlb_for_alloc (patch 09/12)

v11:
- Rebase against swiotlb devel/for-linus-5.14
- s/mempry/memory/g
- exchange the order of patch 09/12 and 10/12
https://lore.kernel.org/patchwork/cover/1447216/

v10:
Address the comments in v9 to
  - fix the dev->dma_io_tlb_mem assignment
  - propagate swiotlb_force setting into io_tlb_default_mem->force
  - move set_memory_decrypted out of swiotlb_init_io_tlb_mem
  - move debugfs_dir declaration into the main CONFIG_DEBUG_FS block
  - add swiotlb_ prefix to find_slots and release_slots
  - merge the 3 alloc/free related patches
  - move the CONFIG_DMA_RESTRICTED_POOL later
https://lore.kernel.org/patchwork/cover/1446882/

v9:
Address the comments in v7 to
  - set swiotlb active pool to dev->dma_io_tlb_mem
  - get rid of get_io_tlb_mem
  - dig out the device struct for is_swiotlb_active
  - move debugfs_create_dir out of swiotlb_create_debugfs
  - do set_memory_decrypted conditionally in swiotlb_init_io_tlb_mem
  - use IS_ENABLED in kernel/dma/direct.c
  - fix redefinition of 'of_dma_set_restricted_buffer'
https://lore.kernel.org/patchwork/cover/1445081/

v8:
- Fix reserved-memory.txt and add the reg property in example.
- Fix sizeof for of_property_count_elems_of_size in
  drivers/of/address.c#of_dma_set_restricted_buffer.
- Apply Will's suggestion to try the OF node having DMA configuration in
  drivers/of/address.c#of_dma_set_restricted_buffer.
- Fix typo in the comment of drivers/of/address.c#of_dma_set_restricted_buffer.
- Add error message for PageHighMem in
  kernel/dma/swiotlb.c#rmem_swiotlb_device_init and move it to
  rmem_swiotlb_setup.
- Fix the message string in rmem_swiotlb_setup.
https://lore.kernel.org/patchwork/cover/1437112/

v7:
Fix debugfs, PageHighMem and comment style in rmem_swiotlb_device_init
https://lore.kernel.org/patchwork/cover/1431031/

v6:
Address the comments in v5
https://lore.kernel.org/patchwork/cover/1423201/

v5:
Rebase on latest linux-next
https://lore.kernel.org/patchwork/cover/1416899/

v4:
- Fix spinlock bad magic
- Use rmem->name for debugfs entry
- Address the comments in v3
https://lore.kernel.org/patchwork/cover/1378113/

v3:
Using only one reserved memory region for both streaming DMA and memory
allocation.
https://lore.kernel.org/patchwork/cover/1360992/

v2:
Building on top of swiotlb.
https://lore.kernel.org/patchwork/cover/1280705/

v1:
Using dma_map_ops.
https://lore.kernel.org/patchwork/cover/1271660/


Claire Chang (12):
  swiotlb: Refactor swiotlb init functions
  swiotlb: Refactor swiotlb_create_debugfs
  swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used
  swiotlb: Update is_swiotlb_buffer to add a struct device argument
  swiotlb: Update is_swiotlb_active to add a struct device argument
  swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
  swiotlb: Move alloc_size to swiotlb_find_slots
  swiotlb: Refactor swiotlb_tbl_unmap_single
  swiotlb: Add restricted DMA alloc/free support
  swiotlb: Add restricted DMA pool initialization
  dt-bindings: of: Add restricted DMA pool
  of: Add plumbing for restricted DMA pool

 .../reserved-memory/reserved-memory.txt   |  36 ++-
 drivers/base/core.c   |   4 +
 drivers/gpu/drm/i915/gem/i915_gem_internal.c  |   2 +-

[powerpc:merge] BUILD SUCCESS 77fe1f3ccbe0cdc6f386aef522b043c52196d4d2

2021-06-16 Thread kernel test robot

tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git 
merge
branch HEAD: 77fe1f3ccbe0cdc6f386aef522b043c52196d4d2  Automatic merge of 
'next' into merge (2021-06-15 23:52)

elapsed time: 873m

configs tested: 125
configs skipped: 2

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
powerpc  mpc866_ads_defconfig
ia64generic_defconfig
arm   versatile_defconfig
arm lpc32xx_defconfig
arm socfpga_defconfig
powerpc sbc8548_defconfig
arm  colibri_pxa270_defconfig
m68kstmark2_defconfig
powerpcicon_defconfig
powerpc  chrp32_defconfig
xtensasmp_lx200_defconfig
mips  ath79_defconfig
powerpc  iss476-smp_defconfig
xtensageneric_kc705_defconfig
powerpc mpc8540_ads_defconfig
armoxnas_v6_defconfig
s390defconfig
armmulti_v7_defconfig
powerpc  g5_defconfig
armhisi_defconfig
xtensa  iss_defconfig
powerpc kmeter1_defconfig
nds32   defconfig
powerpc   lite5200b_defconfig
armneponset_defconfig
armzeus_defconfig
m68k  atari_defconfig
armmvebu_v5_defconfig
ia64zx1_defconfig
powerpc ksi8560_defconfig
ia64 alldefconfig
powerpc  ep88xc_defconfig
arm64alldefconfig
arcvdk_hs38_smp_defconfig
m68k amcore_defconfig
microblaze  mmu_defconfig
mips cu1000-neo_defconfig
m68k   m5275evb_defconfig
armkeystone_defconfig
arm palmz72_defconfig
powerpcmpc7448_hpc2_defconfig
powerpcfsp2_defconfig
s390 alldefconfig
riscv nommu_k210_sdcard_defconfig
s390  debug_defconfig
powerpc mpc832x_mds_defconfig
sparc   sparc64_defconfig
riscv  rv32_defconfig
powerpc mpc85xx_cds_defconfig
x86_64allnoconfig
ia64 allmodconfig
ia64defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
arc  allyesconfig
nds32 allnoconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
s390 allmodconfig
parisc   allyesconfig
i386 allyesconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
i386 randconfig-a002-20210615
i386 randconfig-a006-20210615
i386 randconfig-a004-20210615
i386 randconfig-a001-20210615
i386 randconfig-a005-20210615
i386 randconfig-a003-20210615
x86_64   randconfig-a001-20210615
x86_64   randconfig-a004-20210615
x86_64   randconfig-a002-20210615
x86_64   randconfig-a003-20210615
x86_64   randconfig-a006-20210615
x86_64   randconfig-a005-20210615
i386 randconfig-a015-20210615
i386 randconfig-a013-20210615
i386 randconfig-a016-20210615
i386

80 matches

Mail list logo