Re: [PATCH v3 5/9] mm: remove CONFIG_DISCONTIGMEM

2021-06-11 Thread Mike Rapoport
On Fri, Jun 11, 2021 at 01:53:48PM -0700, Stephen Brennan wrote:
> Mike Rapoport  writes:
> > From: Mike Rapoport 
> >
> > There are no architectures that support DISCONTIGMEM left.
> >
> > Remove the configuration option and the dead code it was guarding in the
> > generic memory management code.
> >
> > Signed-off-by: Mike Rapoport 
> > ---
> >  include/asm-generic/memory_model.h | 37 --
> >  include/linux/mmzone.h |  8 ---
> >  mm/Kconfig | 25 +++-
> >  mm/page_alloc.c| 13 ---
> >  4 files changed, 12 insertions(+), 71 deletions(-)
> >
> > diff --git a/include/asm-generic/memory_model.h 
> > b/include/asm-generic/memory_model.h
> > index 7637fb46ba4f..a2c8ed60233a 100644
> > --- a/include/asm-generic/memory_model.h
> > +++ b/include/asm-generic/memory_model.h
> > @@ -6,47 +6,18 @@
> >  
> >  #ifndef __ASSEMBLY__
> >  
> > +/*
> > + * supports 3 memory models.
> > + */
> 
> This comment could either be updated to reflect 2 memory models, or
> removed entirely.

I counted SPARSE and SPARSE_VMEMMAP as 2.

The code below has three clauses: one for FLATMEM, one for SPARSE and one
for VMEMMAP.
 
> Thanks,
> Stephen
> 
> >  #if defined(CONFIG_FLATMEM)
> >  
> >  #ifndef ARCH_PFN_OFFSET
> >  #define ARCH_PFN_OFFSET(0UL)
> >  #endif
> >  
> > -#elif defined(CONFIG_DISCONTIGMEM)
> > -
> > -#ifndef arch_pfn_to_nid
> > -#define arch_pfn_to_nid(pfn)   pfn_to_nid(pfn)
> > -#endif
> > -
> > -#ifndef arch_local_page_offset
> > -#define arch_local_page_offset(pfn, nid)   \
> > -   ((pfn) - NODE_DATA(nid)->node_start_pfn)
> > -#endif
> > -
> > -#endif /* CONFIG_DISCONTIGMEM */
> > -
> > -/*
> > - * supports 3 memory models.
> > - */
> > -#if defined(CONFIG_FLATMEM)
> > -
> >  #define __pfn_to_page(pfn) (mem_map + ((pfn) - ARCH_PFN_OFFSET))
> >  #define __page_to_pfn(page)((unsigned long)((page) - mem_map) + \
> >  ARCH_PFN_OFFSET)
> > -#elif defined(CONFIG_DISCONTIGMEM)
> > -
> > -#define __pfn_to_page(pfn) \
> > -({ unsigned long __pfn = (pfn);\
> > -   unsigned long __nid = arch_pfn_to_nid(__pfn);  \
> > -   NODE_DATA(__nid)->node_mem_map + arch_local_page_offset(__pfn, __nid);\
> > -})
> > -
> > -#define __page_to_pfn(pg)  \
> > -({ const struct page *__pg = (pg); \
> > -   struct pglist_data *__pgdat = NODE_DATA(page_to_nid(__pg)); \
> > -   (unsigned long)(__pg - __pgdat->node_mem_map) + \
> > -__pgdat->node_start_pfn;   \
> > -})
> >  
> >  #elif defined(CONFIG_SPARSEMEM_VMEMMAP)
> >  
> > @@ -70,7 +41,7 @@
> > struct mem_section *__sec = __pfn_to_section(__pfn);\
> > __section_mem_map_addr(__sec) + __pfn;  \
> >  })
> > -#endif /* CONFIG_FLATMEM/DISCONTIGMEM/SPARSEMEM */
> > +#endif /* CONFIG_FLATMEM/SPARSEMEM */
> >  
> >  /*
> >   * Convert a physical address to a Page Frame Number and back
> > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> > index 0d53eba1c383..700032e99419 100644
> > --- a/include/linux/mmzone.h
> > +++ b/include/linux/mmzone.h
> > @@ -738,10 +738,12 @@ struct zonelist {
> > struct zoneref _zonerefs[MAX_ZONES_PER_ZONELIST + 1];
> >  };
> >  
> > -#ifndef CONFIG_DISCONTIGMEM
> > -/* The array of struct pages - for discontigmem use pgdat->lmem_map */
> > +/*
> > + * The array of struct pages for flatmem.
> > + * It must be declared for SPARSEMEM as well because there are 
> > configurations
> > + * that rely on that.
> > + */
> >  extern struct page *mem_map;
> > -#endif
> >  
> >  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >  struct deferred_split {
> > diff --git a/mm/Kconfig b/mm/Kconfig
> > index 02d44e3420f5..218b96ccc84a 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -19,7 +19,7 @@ choice
> >  
> >  config FLATMEM_MANUAL
> > bool "Flat Memory"
> > -   depends on !(ARCH_DISCONTIGMEM_ENABLE || ARCH_SPARSEMEM_ENABLE) || 
> > ARCH_FLATMEM_ENABLE
> > +   depends on !ARCH_SPARSEMEM_ENABLE || ARCH_FLATMEM_ENABLE
> > help
> >   This option is best suited for non-NUMA systems with
> >   flat address space. The FLATMEM is the most efficient
> > @@ -32,21 +32,6 @@ config FLATMEM_MANUAL
> >  
> >   If unsure, choose this option (Flat Memory) over any other.
> >  
> > -config DISCONTIGMEM_MANUAL
> > -   bool "Discontiguous Memory"
> > -   depends on ARCH_DISCONTIGMEM_ENABLE
> > -   help
> > - This option provides enhanced support for discontiguous
> > - memory systems, over FLATMEM.  These systems have holes
> > - in their physical address spaces, and this option provides
> > - more efficient handling of these holes.
> > -
> > - Although "Discontiguous Memory" is still used by several
> > - architectures, it is considered deprecated in favor of
> > - "Sparse Memory".
> > -
> > - If unsure, 

Re: [PATCH 09/16] ps3disk: use memcpy_{from,to}_bvec

2021-06-11 Thread Ira Weiny
On Fri, Jun 11, 2021 at 08:53:38AM +0200, Christoph Hellwig wrote:
> On Tue, Jun 08, 2021 at 06:48:22PM -0700, Ira Weiny wrote:
> > I'm still not 100% sure that these flushes are needed but the are not 
> > no-ops on
> > every arch.  Would it be best to preserve them after the 
> > memcpy_to/from_bvec()?
> > 
> > Same thing in patch 11 and 14.
> 
> To me it seems kunmap_local should basically always call the equivalent
> of flush_kernel_dcache_page.  parisc does this through
> kunmap_flush_on_unmap, but none of the other architectures with VIVT
> caches or other coherency issues does.
> 
> Does anyone have a history or other insights here?

I went digging into the current callers of flush_kernel_dcache_page() other
than this one.  To see if adding kunmap_flush_on_unmap() to the other arch's
would cause any problems.

In particular this call site stood out because it is not always called?!?!?!?

void sg_miter_stop(struct sg_mapping_iter *miter)
{
...
if ((miter->__flags & SG_MITER_TO_SG) &&
!PageSlab(miter->page))
flush_kernel_dcache_page(miter->page);
...
}

Looking at 

3d77b50c5874 lib/scatterlist.c: don't flush_kernel_dcache_page on slab page[1]

It seems the restrictions they are quoting for the page are completely out of
date.  I don't see any current way for a VM_BUG_ON() to be triggered.  So is
this code really necessary?

More recently this was added:

7e34e0bbc644 crypto: omap-crypto - fix userspace copied buffer access

I'm CC'ing Tero and Herbert to see why they added the SLAB check.


Then we have interesting comments like this...

...
/* This can go away once MIPS implements
 * flush_kernel_dcache_page */
flush_dcache_page(miter->page);
...


And some users optimizing.

...
/* discard mappings */
if (direction == DMA_FROM_DEVICE)
flush_kernel_dcache_page(sg_page(sg));  
...

The uses in fs/exec.c are the most straight forward and can simply rely on the
kunmap() code to replace the call.

In conclusion I don't see a lot of reason to not define kunmap_flush_on_unmap()
on arm, csky, mips, nds32, and sh...  Then remove all the
flush_kernel_dcache_page() call sites and the documentation...

Something like [2] below...  Completely untested of course...

Ira


[1] commit 3d77b50c5874b7e923be946ba793644f82336b75
Author: Ming Lei 
Date:   Thu Oct 31 16:34:17 2013 -0700

lib/scatterlist.c: don't flush_kernel_dcache_page on slab page

Commit b1adaf65ba03 ("[SCSI] block: add sg buffer copy helper
functions") introduces two sg buffer copy helpers, and calls
flush_kernel_dcache_page() on pages in SG list after these pages are
written to.

Unfortunately, the commit may introduce a potential bug:

 - Before sending some SCSI commands, kmalloc() buffer may be passed to
   block layper, so flush_kernel_dcache_page() can see a slab page
   finally

 - According to cachetlb.txt, flush_kernel_dcache_page() is only called
   on "a user page", which surely can't be a slab page.

 - ARCH's implementation of flush_kernel_dcache_page() may use page
   mapping information to do optimization so page_mapping() will see the
   slab page, then VM_BUG_ON() is triggered.

Aaro Koskinen reported the bug on ARM/kirkwood when DEBUG_VM is enabled,
and this patch fixes the bug by adding test of '!PageSlab(miter->page)'
before calling flush_kernel_dcache_page().


[2]


>From 70b537c31d16c2a5e4e92c35895e8c59303bcbef Mon Sep 17 00:00:00 2001
From: Ira Weiny 
Date: Fri, 11 Jun 2021 18:24:27 -0700
Subject: [PATCH] COMPLETELY UNTESTED: highmem: Remove direct calls to 
flush_kernel_dcache_page

When to call flush_kernel_dcache_page() is confusing and inconsistent.  For
architectures which may need to do something the core kmap code should be
leveraged to handle this when direct kernel access is needed.

Like parisc define kunmap_flush_on_unmap() to be called when pages are
unmapped on arm, csky, mpis, nds32, and sh.

Remove all direct calls to flush_kernel_dcache_page() and let the
kunmap() code do this for the users.


Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-c...@vger.kernel.org
Cc: linux-m...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: linux-cry...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-s...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Signed-off-by: Ira Weiny 
---
 Documentation/core-api/cachetlb.rst  | 13 -
 arch/arm/include/asm/cacheflush.h|  6 ++
 arch/csky/abiv1/inc/abi/cacheflush.h |  6 ++
 arch/mips/include/asm/cacheflush.h   |  6 ++
 arch/nds32/include/asm/cacheflush.h  |  6 ++
 arch/sh/include/asm/cacheflush.h |  6 ++
 drivers/crypto/omap-crypto.c |  3 ---
 drivers/mmc/host/mmc_spi.c   |  3 ---
 drivers/scsi/aacraid/aachba.c|  1 -
 fs/exec.c|  3 

[PATCH] usb: gadget: fsl: properly remove remnant of MXC support

2021-06-11 Thread Li Yang
Commit a390bef7db1f ("usb: gadget: fsl_mxc_udc: Remove the driver")
didn't remove all the MXC related stuff which can cause build problem
for LS1021 when enabled again in Kconfig.  This patch remove all the
remnants.

Signed-off-by: Li Yang 
---
 drivers/usb/gadget/udc/fsl_udc_core.c | 36 +--
 drivers/usb/gadget/udc/fsl_usb2_udc.h | 19 --
 2 files changed, 6 insertions(+), 49 deletions(-)

diff --git a/drivers/usb/gadget/udc/fsl_udc_core.c 
b/drivers/usb/gadget/udc/fsl_udc_core.c
index 2b357b3f64c0..29fcb9b461d7 100644
--- a/drivers/usb/gadget/udc/fsl_udc_core.c
+++ b/drivers/usb/gadget/udc/fsl_udc_core.c
@@ -36,7 +36,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 #include 
@@ -323,13 +322,11 @@ static int dr_controller_setup(struct fsl_udc *udc)
fsl_writel(tmp, _regs->endptctrl[ep_num]);
}
/* Config control enable i/o output, cpu endian register */
-#ifndef CONFIG_ARCH_MXC
if (udc->pdata->have_sysif_regs) {
ctrl = __raw_readl(_sys_regs->control);
ctrl |= USB_CTRL_IOENB;
__raw_writel(ctrl, _sys_regs->control);
}
-#endif
 
 #if defined(CONFIG_PPC32) && !defined(CONFIG_NOT_COHERENT_CACHE)
/* Turn on cache snooping hardware, since some PowerPC platforms
@@ -2153,7 +2150,6 @@ static int fsl_proc_read(struct seq_file *m, void *v)
tmp_reg = fsl_readl(_regs->endpointprime);
seq_printf(m, "EP Prime Reg = [0x%x]\n\n", tmp_reg);
 
-#ifndef CONFIG_ARCH_MXC
if (udc->pdata->have_sysif_regs) {
tmp_reg = usb_sys_regs->snoop1;
seq_printf(m, "Snoop1 Reg : = [0x%x]\n\n", tmp_reg);
@@ -2161,7 +2157,6 @@ static int fsl_proc_read(struct seq_file *m, void *v)
tmp_reg = usb_sys_regs->control;
seq_printf(m, "General Control Reg : = [0x%x]\n\n", tmp_reg);
}
-#endif
 
/* --fsl_udc, fsl_ep, fsl_request structure information - */
ep = >eps[0];
@@ -2412,28 +2407,21 @@ static int fsl_udc_probe(struct platform_device *pdev)
 */
if (pdata->init && pdata->init(pdev)) {
ret = -ENODEV;
-   goto err_iounmap_noclk;
+   goto err_iounmap;
}
 
/* Set accessors only after pdata->init() ! */
fsl_set_accessors(pdata);
 
-#ifndef CONFIG_ARCH_MXC
if (pdata->have_sysif_regs)
usb_sys_regs = (void *)dr_regs + USB_DR_SYS_OFFSET;
-#endif
-
-   /* Initialize USB clocks */
-   ret = fsl_udc_clk_init(pdev);
-   if (ret < 0)
-   goto err_iounmap_noclk;
 
/* Read Device Controller Capability Parameters register */
dccparams = fsl_readl(_regs->dccparams);
if (!(dccparams & DCCPARAMS_DC)) {
ERR("This SOC doesn't support device role\n");
ret = -ENODEV;
-   goto err_iounmap;
+   goto err_exit;
}
/* Get max device endpoints */
/* DEN is bidirectional ep number, max_ep doubles the number */
@@ -2442,7 +2430,7 @@ static int fsl_udc_probe(struct platform_device *pdev)
ret = platform_get_irq(pdev, 0);
if (ret <= 0) {
ret = ret ? : -ENODEV;
-   goto err_iounmap;
+   goto err_exit;
}
udc_controller->irq = ret;
 
@@ -2451,7 +2439,7 @@ static int fsl_udc_probe(struct platform_device *pdev)
if (ret != 0) {
ERR("cannot request irq %d err %d\n",
udc_controller->irq, ret);
-   goto err_iounmap;
+   goto err_exit;
}
 
/* Initialize the udc structure including QH member and other member */
@@ -2467,10 +2455,6 @@ static int fsl_udc_probe(struct platform_device *pdev)
dr_controller_setup(udc_controller);
}
 
-   ret = fsl_udc_clk_finalize(pdev);
-   if (ret)
-   goto err_free_irq;
-
/* Setup gadget structure */
udc_controller->gadget.ops = _gadget_ops;
udc_controller->gadget.max_speed = USB_SPEED_HIGH;
@@ -2530,11 +2514,10 @@ static int fsl_udc_probe(struct platform_device *pdev)
dma_pool_destroy(udc_controller->td_pool);
 err_free_irq:
free_irq(udc_controller->irq, udc_controller);
-err_iounmap:
+err_exit:
if (pdata->exit)
pdata->exit(pdev);
-   fsl_udc_clk_release();
-err_iounmap_noclk:
+err_iounmap:
iounmap(dr_regs);
 err_release_mem_region:
if (pdata->operating_mode == FSL_USB2_DR_DEVICE)
@@ -2561,8 +2544,6 @@ static int fsl_udc_remove(struct platform_device *pdev)
udc_controller->done = 
usb_del_gadget_udc(_controller->gadget);
 
-   fsl_udc_clk_release();
-
/* DR has been stopped in usb_gadget_unregister_driver() */
remove_proc_file();
 
@@ -2677,10 +2658,6 @@ static int fsl_udc_otg_resume(struct device *dev)
 

Re: [PATCH v3 5/9] mm: remove CONFIG_DISCONTIGMEM

2021-06-11 Thread Stephen Brennan
Mike Rapoport  writes:
> From: Mike Rapoport 
>
> There are no architectures that support DISCONTIGMEM left.
>
> Remove the configuration option and the dead code it was guarding in the
> generic memory management code.
>
> Signed-off-by: Mike Rapoport 
> ---
>  include/asm-generic/memory_model.h | 37 --
>  include/linux/mmzone.h |  8 ---
>  mm/Kconfig | 25 +++-
>  mm/page_alloc.c| 13 ---
>  4 files changed, 12 insertions(+), 71 deletions(-)
>
> diff --git a/include/asm-generic/memory_model.h 
> b/include/asm-generic/memory_model.h
> index 7637fb46ba4f..a2c8ed60233a 100644
> --- a/include/asm-generic/memory_model.h
> +++ b/include/asm-generic/memory_model.h
> @@ -6,47 +6,18 @@
>  
>  #ifndef __ASSEMBLY__
>  
> +/*
> + * supports 3 memory models.
> + */

This comment could either be updated to reflect 2 memory models, or
removed entirely.

Thanks,
Stephen

>  #if defined(CONFIG_FLATMEM)
>  
>  #ifndef ARCH_PFN_OFFSET
>  #define ARCH_PFN_OFFSET  (0UL)
>  #endif
>  
> -#elif defined(CONFIG_DISCONTIGMEM)
> -
> -#ifndef arch_pfn_to_nid
> -#define arch_pfn_to_nid(pfn) pfn_to_nid(pfn)
> -#endif
> -
> -#ifndef arch_local_page_offset
> -#define arch_local_page_offset(pfn, nid) \
> - ((pfn) - NODE_DATA(nid)->node_start_pfn)
> -#endif
> -
> -#endif /* CONFIG_DISCONTIGMEM */
> -
> -/*
> - * supports 3 memory models.
> - */
> -#if defined(CONFIG_FLATMEM)
> -
>  #define __pfn_to_page(pfn)   (mem_map + ((pfn) - ARCH_PFN_OFFSET))
>  #define __page_to_pfn(page)  ((unsigned long)((page) - mem_map) + \
>ARCH_PFN_OFFSET)
> -#elif defined(CONFIG_DISCONTIGMEM)
> -
> -#define __pfn_to_page(pfn)   \
> -({   unsigned long __pfn = (pfn);\
> - unsigned long __nid = arch_pfn_to_nid(__pfn);  \
> - NODE_DATA(__nid)->node_mem_map + arch_local_page_offset(__pfn, __nid);\
> -})
> -
> -#define __page_to_pfn(pg)\
> -({   const struct page *__pg = (pg); \
> - struct pglist_data *__pgdat = NODE_DATA(page_to_nid(__pg)); \
> - (unsigned long)(__pg - __pgdat->node_mem_map) + \
> -  __pgdat->node_start_pfn;   \
> -})
>  
>  #elif defined(CONFIG_SPARSEMEM_VMEMMAP)
>  
> @@ -70,7 +41,7 @@
>   struct mem_section *__sec = __pfn_to_section(__pfn);\
>   __section_mem_map_addr(__sec) + __pfn;  \
>  })
> -#endif /* CONFIG_FLATMEM/DISCONTIGMEM/SPARSEMEM */
> +#endif /* CONFIG_FLATMEM/SPARSEMEM */
>  
>  /*
>   * Convert a physical address to a Page Frame Number and back
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index 0d53eba1c383..700032e99419 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -738,10 +738,12 @@ struct zonelist {
>   struct zoneref _zonerefs[MAX_ZONES_PER_ZONELIST + 1];
>  };
>  
> -#ifndef CONFIG_DISCONTIGMEM
> -/* The array of struct pages - for discontigmem use pgdat->lmem_map */
> +/*
> + * The array of struct pages for flatmem.
> + * It must be declared for SPARSEMEM as well because there are configurations
> + * that rely on that.
> + */
>  extern struct page *mem_map;
> -#endif
>  
>  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
>  struct deferred_split {
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 02d44e3420f5..218b96ccc84a 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -19,7 +19,7 @@ choice
>  
>  config FLATMEM_MANUAL
>   bool "Flat Memory"
> - depends on !(ARCH_DISCONTIGMEM_ENABLE || ARCH_SPARSEMEM_ENABLE) || 
> ARCH_FLATMEM_ENABLE
> + depends on !ARCH_SPARSEMEM_ENABLE || ARCH_FLATMEM_ENABLE
>   help
> This option is best suited for non-NUMA systems with
> flat address space. The FLATMEM is the most efficient
> @@ -32,21 +32,6 @@ config FLATMEM_MANUAL
>  
> If unsure, choose this option (Flat Memory) over any other.
>  
> -config DISCONTIGMEM_MANUAL
> - bool "Discontiguous Memory"
> - depends on ARCH_DISCONTIGMEM_ENABLE
> - help
> -   This option provides enhanced support for discontiguous
> -   memory systems, over FLATMEM.  These systems have holes
> -   in their physical address spaces, and this option provides
> -   more efficient handling of these holes.
> -
> -   Although "Discontiguous Memory" is still used by several
> -   architectures, it is considered deprecated in favor of
> -   "Sparse Memory".
> -
> -   If unsure, choose "Sparse Memory" over this option.
> -
>  config SPARSEMEM_MANUAL
>   bool "Sparse Memory"
>   depends on ARCH_SPARSEMEM_ENABLE
> @@ -62,17 +47,13 @@ config SPARSEMEM_MANUAL
>  
>  endchoice
>  
> -config DISCONTIGMEM
> - def_bool y
> - depends on (!SELECT_MEMORY_MODEL && ARCH_DISCONTIGMEM_ENABLE) || 
> DISCONTIGMEM_MANUAL
> -
>  config SPARSEMEM
>   def_bool y
>   depends on 

Re: [PATCH] fs: btrfs: Disable BTRFS on platforms having 256K pages

2021-06-11 Thread Qu Wenruo




On 2021/6/10 下午1:23, Christophe Leroy wrote:

With a config having PAGE_SIZE set to 256K, BTRFS build fails
with the following message

  include/linux/compiler_types.h:326:38: error: call to 
'__compiletime_assert_791' declared with attribute error: BUILD_BUG_ON failed: 
(BTRFS_MAX_COMPRESSED % PAGE_SIZE) != 0

BTRFS_MAX_COMPRESSED being 128K, BTRFS cannot support platforms with
256K pages at the time being.

There are two platforms that can select 256K pages:
  - hexagon
  - powerpc

Disable BTRFS when 256K page size is selected.

Reported-by: kernel test robot 
Signed-off-by: Christophe Leroy 
---
  fs/btrfs/Kconfig | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/fs/btrfs/Kconfig b/fs/btrfs/Kconfig
index 68b95ad82126..520a0f6a7d9e 100644
--- a/fs/btrfs/Kconfig
+++ b/fs/btrfs/Kconfig
@@ -18,6 +18,8 @@ config BTRFS_FS
select RAID6_PQ
select XOR_BLOCKS
select SRCU
+   depends on !PPC_256K_PAGES  # powerpc
+   depends on !PAGE_SIZE_256KB # hexagon


I'm OK to disable page size other than 4K, 16K, 32K, 64K for now.

Although for other reasons.

Not only for the BUILD_BUG_ON(), but for the fact that btrfs only
support 4K, 16K, 32K, 64K sectorsize, and requires PAGE_SIZE == sectorsize.

Although we're adding subpage support, the subpage support only comes
with 4K sectorsize on 64K page size.

Until variable length version is introduced, 256K/128K page size won't
be support.

Thus I'm fine to disable BTRFS for any arch outside of the supported
page sizes for now.

Thanks,
Qu


help
  Btrfs is a general purpose copy-on-write filesystem with extents,



Re: [PATCH] btrfs: Disable BTRFS on platforms having 256K pages

2021-06-11 Thread Chris Mason

> On Jun 11, 2021, at 9:21 AM, David Sterba  wrote:
> 
> On Fri, Jun 11, 2021 at 12:58:58PM +, Chris Mason wrote:
>>> On Jun 10, 2021, at 12:20 PM, David Sterba  wrote:
>>> On Thu, Jun 10, 2021 at 04:50:09PM +0200, Christophe Leroy wrote:
 Le 10/06/2021 à 15:54, Chris Mason a écrit :
>> On Jun 10, 2021, at 1:23 AM, Christophe Leroy 
>>  wrote:
>>> And there's no such thing like "just bump BTRFS_MAX_COMPRESSED to 256K".
>>> The constant is part of on-disk format for lzo and otherwise changing it
>>> would impact performance so this would need proper evaluation.
>> 
>> Sorry, how is it baked into LZO?  It definitely will have performance 
>> implications, I agree there.
> 
> lzo_decompress_bio:
> 
> 309 /*
> 310  * Compressed data header check.
> 311  *
> 312  * The real compressed size can't exceed the maximum extent 
> length, and
> 313  * all pages should be used (whole unused page with just the 
> segment
> 314  * header is not possible).  If this happens it means the 
> compressed
> 315  * extent is corrupted.
> 316  */
> 317 if (tot_len > min_t(size_t, BTRFS_MAX_COMPRESSED, srclen) ||
> 318 tot_len < srclen - PAGE_SIZE) {
> 319 ret = -EUCLEAN;
> 320 goto done;
> 321 }

Ah I see, so going back to an old LZO kernel will get upset.  Ok, fair enough.  
So if we want to bump this for other reasons, we’ll need to make an LZO max 
size to maintain compatibility.

-chris

Re: [PATCH 1/2] powerpc/64: drop redundant defination of spin_until_cond

2021-06-11 Thread Sudeep Holla
On Fri, Jun 11, 2021 at 07:10:57PM +, Christophe Leroy wrote:
> From: Sudeep Holla 
> 
> linux/processor.h has exactly same defination for spin_until_cond.
> Drop the redundant defination in asm/processor.h
>

Wow you must be real good at ML archaeology, this must have been at-least
3+ years old. I found this when I wanted to you spin_until_cond. Thanks
anyways for digging the original patch, nobody would have remembered even
if you posted fresh .

-- 
Regards,
Sudeep


[PATCH 2/2] powerpc/watchdog: include linux/processor.h for spin_until_cond

2021-06-11 Thread Christophe Leroy
From: Sudeep Holla 

This implementation uses spin_until_cond in wd_smp_lock including
neither linux/processor.h nor asm/processor.h

This patch includes linux/processor.h here for spin_until_cond usage.

Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Signed-off-by: Sudeep Holla 
Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/watchdog.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kernel/watchdog.c b/arch/powerpc/kernel/watchdog.c
index c9a8f4781a10..a165635fd214 100644
--- a/arch/powerpc/kernel/watchdog.c
+++ b/arch/powerpc/kernel/watchdog.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
-- 
2.25.0



[PATCH 1/2] powerpc/64: drop redundant defination of spin_until_cond

2021-06-11 Thread Christophe Leroy
From: Sudeep Holla 

linux/processor.h has exactly same defination for spin_until_cond.
Drop the redundant defination in asm/processor.h

Cc: Nicholas Piggin 
Cc: Michael Ellerman 
Signed-off-by: Sudeep Holla 
Signed-off-by: Christophe Leroy 
---
 That's just a rebase

 arch/powerpc/include/asm/processor.h | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/arch/powerpc/include/asm/processor.h 
b/arch/powerpc/include/asm/processor.h
index 7bf8a15af224..0819854eeab9 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -339,17 +339,6 @@ static inline unsigned long __pack_fe01(unsigned int 
fpmode)
 
 #define spin_end() HMT_medium()
 
-#define spin_until_cond(cond)  \
-do {   \
-   if (unlikely(!(cond))) {\
-   spin_begin();   \
-   do {\
-   spin_cpu_relax();   \
-   } while (!(cond));  \
-   spin_end(); \
-   }   \
-} while (0)
-
 #endif
 
 /* Check that a certain kernel stack pointer is valid in task_struct p */
-- 
2.25.0



[PATCH] powerpc/32: Display modules range in virtual memory layout

2021-06-11 Thread Christophe Leroy
book3s/32 and 8xx don't use vmalloc for modules.

Print the modules area at startup as part of the virtual memory layout:

[0.00] Kernel virtual memory layout:
[0.00]   * 0xffafc000..0xc000  : fixmap
[0.00]   * 0xc900..0xffafc000  : vmalloc & ioremap
[0.00]   * 0xb000..0xc000  : modules
[0.00] Memory: 118480K/131072K available (7152K kernel code, 2320K 
rwdata, 1328K rodata, 368K init, 854K bss, 12592K reserved, 0K cma-reserved)

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/mm/mem.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 77fce7aa7dc5..c3b4fdda7069 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -302,6 +302,10 @@ void __init mem_init(void)
ioremap_bot, IOREMAP_TOP);
pr_info("  * 0x%08lx..0x%08lx  : vmalloc & ioremap\n",
VMALLOC_START, VMALLOC_END);
+#ifdef MODULES_VADDR
+   pr_info("  * 0x%08lx..0x%08lx  : modules\n",
+   MODULES_VADDR, MODULES_END);
+#endif
 #endif /* CONFIG_PPC32 */
 }
 
-- 
2.25.0



Re: simplify gendisk and request_queue allocation for blk-mq based drivers

2021-06-11 Thread Jens Axboe
On 6/2/21 12:53 AM, Christoph Hellwig wrote:
> Hi all,
> 
> this series is the scond part of cleaning up lifetimes and allocation of
> the gendisk and request_queue structure.  It adds a new interface to
> allocate the disk and queue together for blk based drivers, and uses that
> in all drivers that do not have any caveats in their gendisk and
> request_queue lifetime rules.

Applied, thanks.

-- 
Jens Axboe



Re: [PATCH] btrfs: Disable BTRFS on platforms having 256K pages

2021-06-11 Thread Chris Mason

> On Jun 10, 2021, at 12:20 PM, David Sterba  wrote:
> 
> On Thu, Jun 10, 2021 at 04:50:09PM +0200, Christophe Leroy wrote:
>> 
>> 
>> Le 10/06/2021 à 15:54, Chris Mason a écrit :
>>> 
 On Jun 10, 2021, at 1:23 AM, Christophe Leroy 
  wrote:
 
 With a config having PAGE_SIZE set to 256K, BTRFS build fails
 with the following message
 
 include/linux/compiler_types.h:326:38: error: call to 
 '__compiletime_assert_791' declared with attribute error: BUILD_BUG_ON 
 failed: (BTRFS_MAX_COMPRESSED % PAGE_SIZE) != 0
 
 BTRFS_MAX_COMPRESSED being 128K, BTRFS cannot support platforms with
 256K pages at the time being.
 
 There are two platforms that can select 256K pages:
 - hexagon
 - powerpc
 
 Disable BTRFS when 256K page size is selected.
 
>>> 
>>> We’ll have other subpage blocksize concerns with 256K pages, but this 
>>> BTRFS_MAX_COMPRESSED #define is arbitrary.  It’s just trying to have an 
>>> upper bound on the amount of memory we’ll need to uncompress a single 
>>> page’s worth of random reads.
>>> 
>>> We could change it to max(PAGE_SIZE, 128K) or just bump to 256K.
>>> 
>> 
>> But if 256K is problematic in other ways, is it worth bumping 
>> BTRFS_MAX_COMPRESSED to 256K ?
>> 
>> David, in below mail, said that 256K support would require deaper changes. 
>> So disabling BTRFS 
>> support seems the easiest solution for the time being, at least for Stable 
>> (I forgot the Fixes: tag 
>> and the CC: to stable).
>> 
>> On powerpc, 256k pages is a corner case, it requires customised binutils, so 
>> I don't think disabling 
>> BTRFS is a issue there. For hexagon I don't know.
> 
> That it blew up due to the max compressed size is a coincidence. We
> could have explicit BUILD_BUG_ONs for page size or other constraints
> derived from the page size like INLINE_EXTENT_BUFFER_PAGES.
> 

Right, the constraint is bigger and more complex than BTRFS_MAX_COMPRESSED.

> And there's no such thing like "just bump BTRFS_MAX_COMPRESSED to 256K".
> The constant is part of on-disk format for lzo and otherwise changing it
> would impact performance so this would need proper evaluation.

Sorry, how is it baked into LZO?  It definitely will have performance 
implications, I agree there.

-chris



Re: [PATCH v9 06/14] swiotlb: Update is_swiotlb_active to add a struct device argument

2021-06-11 Thread Claire Chang
I don't have the HW to verify the change. Hopefully I use the right
device struct for is_swiotlb_active.


Re: [PATCH v8 00/15] Restricted DMA

2021-06-11 Thread Claire Chang
v9 here: https://lore.kernel.org/patchwork/cover/1445081/

On Mon, Jun 7, 2021 at 11:28 AM Claire Chang  wrote:
>
> On Sat, Jun 5, 2021 at 1:48 AM Will Deacon  wrote:
> >
> > Hi Claire,
> >
> > On Thu, May 27, 2021 at 08:58:30PM +0800, Claire Chang wrote:
> > > This series implements mitigations for lack of DMA access control on
> > > systems without an IOMMU, which could result in the DMA accessing the
> > > system memory at unexpected times and/or unexpected addresses, possibly
> > > leading to data leakage or corruption.
> > >
> > > For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
> > > not behind an IOMMU. As PCI-e, by design, gives the device full access to
> > > system memory, a vulnerability in the Wi-Fi firmware could easily escalate
> > > to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
> > > full chain of exploits; [2], [3]).
> > >
> > > To mitigate the security concerns, we introduce restricted DMA. Restricted
> > > DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a
> > > specially allocated region and does memory allocation from the same 
> > > region.
> > > The feature on its own provides a basic level of protection against the 
> > > DMA
> > > overwriting buffer contents at unexpected times. However, to protect
> > > against general data leakage and system memory corruption, the system 
> > > needs
> > > to provide a way to restrict the DMA to a predefined memory region (this 
> > > is
> > > usually done at firmware level, e.g. MPU in ATF on some ARM platforms 
> > > [4]).
> > >
> > > [1a] 
> > > https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html
> > > [1b] 
> > > https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html
> > > [2] https://blade.tencent.com/en/advisories/qualpwn/
> > > [3] 
> > > https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/
> > > [4] 
> > > https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132
> > >
> > > v8:
> > > - Fix reserved-memory.txt and add the reg property in example.
> > > - Fix sizeof for of_property_count_elems_of_size in
> > >   drivers/of/address.c#of_dma_set_restricted_buffer.
> > > - Apply Will's suggestion to try the OF node having DMA configuration in
> > >   drivers/of/address.c#of_dma_set_restricted_buffer.
> > > - Fix typo in the comment of 
> > > drivers/of/address.c#of_dma_set_restricted_buffer.
> > > - Add error message for PageHighMem in
> > >   kernel/dma/swiotlb.c#rmem_swiotlb_device_init and move it to
> > >   rmem_swiotlb_setup.
> > > - Fix the message string in rmem_swiotlb_setup.
> >
> > Thanks for the v8. It works for me out of the box on arm64 under KVM, so:
> >
> > Tested-by: Will Deacon 
> >
> > Note that something seems to have gone wrong with the mail threading, so
> > the last 5 patches ended up as a separate thread for me. Probably worth
> > posting again with all the patches in one place, if you can.
>
> Thanks for testing.
>
> Christoph also added some comments in v7, so I'll prepare v9.
>
> >
> > Cheers,
> >
> > Will


Re: [PATCH v9 03/14] swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used

2021-06-11 Thread Claire Chang
I'm not sure if this would break arch/x86/pci/sta2x11-fixup.c
swiotlb_late_init_with_default_size is called here
https://elixir.bootlin.com/linux/v5.13-rc5/source/arch/x86/pci/sta2x11-fixup.c#L60

On Fri, Jun 11, 2021 at 11:27 PM Claire Chang  wrote:
>
> Always have the pointer to the swiotlb pool used in struct device. This
> could help simplify the code for other pools.
>
> Signed-off-by: Claire Chang 
> ---
>  drivers/of/device.c | 3 +++
>  include/linux/device.h  | 4 
>  include/linux/swiotlb.h | 8 
>  kernel/dma/swiotlb.c| 8 
>  4 files changed, 19 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/of/device.c b/drivers/of/device.c
> index c5a9473a5fb1..1defdf15ba95 100644
> --- a/drivers/of/device.c
> +++ b/drivers/of/device.c
> @@ -165,6 +165,9 @@ int of_dma_configure_id(struct device *dev, struct 
> device_node *np,
>
> arch_setup_dma_ops(dev, dma_start, size, iommu, coherent);
>
> +   if (IS_ENABLED(CONFIG_SWIOTLB))
> +   swiotlb_set_io_tlb_default_mem(dev);
> +
> return 0;
>  }
>  EXPORT_SYMBOL_GPL(of_dma_configure_id);
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 4443e12238a0..2e9a378c9100 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -432,6 +432,7 @@ struct dev_links_info {
>   * @dma_pools: Dma pools (if dma'ble device).
>   * @dma_mem:   Internal for coherent mem override.
>   * @cma_area:  Contiguous memory area for dma allocations
> + * @dma_io_tlb_mem: Pointer to the swiotlb pool used.  Not for driver use.
>   * @archdata:  For arch-specific additions.
>   * @of_node:   Associated device tree node.
>   * @fwnode:Associated device node supplied by platform firmware.
> @@ -540,6 +541,9 @@ struct device {
>  #ifdef CONFIG_DMA_CMA
> struct cma *cma_area;   /* contiguous memory area for dma
>allocations */
> +#endif
> +#ifdef CONFIG_SWIOTLB
> +   struct io_tlb_mem *dma_io_tlb_mem;
>  #endif
> /* arch specific additions */
> struct dev_archdata archdata;
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 216854a5e513..008125ccd509 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -108,6 +108,11 @@ static inline bool is_swiotlb_buffer(phys_addr_t paddr)
> return mem && paddr >= mem->start && paddr < mem->end;
>  }
>
> +static inline void swiotlb_set_io_tlb_default_mem(struct device *dev)
> +{
> +   dev->dma_io_tlb_mem = io_tlb_default_mem;
> +}
> +
>  void __init swiotlb_exit(void);
>  unsigned int swiotlb_max_segment(void);
>  size_t swiotlb_max_mapping_size(struct device *dev);
> @@ -119,6 +124,9 @@ static inline bool is_swiotlb_buffer(phys_addr_t paddr)
>  {
> return false;
>  }
> +static inline void swiotlb_set_io_tlb_default_mem(struct device *dev)
> +{
> +}
>  static inline void swiotlb_exit(void)
>  {
>  }
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 8a3e2b3b246d..29b950ab1351 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -344,7 +344,7 @@ void __init swiotlb_exit(void)
>  static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t 
> size,
>enum dma_data_direction dir)
>  {
> -   struct io_tlb_mem *mem = io_tlb_default_mem;
> +   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
> int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
> phys_addr_t orig_addr = mem->slots[index].orig_addr;
> size_t alloc_size = mem->slots[index].alloc_size;
> @@ -426,7 +426,7 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, 
> unsigned int index)
>  static int find_slots(struct device *dev, phys_addr_t orig_addr,
> size_t alloc_size)
>  {
> -   struct io_tlb_mem *mem = io_tlb_default_mem;
> +   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
> unsigned long boundary_mask = dma_get_seg_boundary(dev);
> dma_addr_t tbl_dma_addr =
> phys_to_dma_unencrypted(dev, mem->start) & boundary_mask;
> @@ -503,7 +503,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
> phys_addr_t orig_addr,
> size_t mapping_size, size_t alloc_size,
> enum dma_data_direction dir, unsigned long attrs)
>  {
> -   struct io_tlb_mem *mem = io_tlb_default_mem;
> +   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
> unsigned int offset = swiotlb_align_offset(dev, orig_addr);
> unsigned int i;
> int index;
> @@ -554,7 +554,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
> phys_addr_t tlb_addr,
>   size_t mapping_size, enum dma_data_direction 
> dir,
>   unsigned long attrs)
>  {
> -   struct io_tlb_mem *mem = io_tlb_default_mem;
> +   struct io_tlb_mem *mem = hwdev->dma_io_tlb_mem;
> unsigned long flags;
> unsigned int offset = 

[PATCH v9 14/14] of: Add plumbing for restricted DMA pool

2021-06-11 Thread Claire Chang
If a device is not behind an IOMMU, we look up the device node and set
up the restricted DMA when the restricted-dma-pool is presented.

Signed-off-by: Claire Chang 
---
 drivers/of/address.c| 33 +
 drivers/of/device.c |  3 +++
 drivers/of/of_private.h |  6 ++
 3 files changed, 42 insertions(+)

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 3b2acca7e363..c8066d95ff0e 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1001,6 +1002,38 @@ int of_dma_get_range(struct device_node *np, const 
struct bus_dma_region **map)
of_node_put(node);
return ret;
 }
+
+int of_dma_set_restricted_buffer(struct device *dev, struct device_node *np)
+{
+   struct device_node *node, *of_node = dev->of_node;
+   int count, i;
+
+   count = of_property_count_elems_of_size(of_node, "memory-region",
+   sizeof(u32));
+   /*
+* If dev->of_node doesn't exist or doesn't contain memory-region, try
+* the OF node having DMA configuration.
+*/
+   if (count <= 0) {
+   of_node = np;
+   count = of_property_count_elems_of_size(
+   of_node, "memory-region", sizeof(u32));
+   }
+
+   for (i = 0; i < count; i++) {
+   node = of_parse_phandle(of_node, "memory-region", i);
+   /*
+* There might be multiple memory regions, but only one
+* restricted-dma-pool region is allowed.
+*/
+   if (of_device_is_compatible(node, "restricted-dma-pool") &&
+   of_device_is_available(node))
+   return of_reserved_mem_device_init_by_idx(dev, of_node,
+ i);
+   }
+
+   return 0;
+}
 #endif /* CONFIG_HAS_DMA */
 
 /**
diff --git a/drivers/of/device.c b/drivers/of/device.c
index 1defdf15ba95..ba4656e77502 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -168,6 +168,9 @@ int of_dma_configure_id(struct device *dev, struct 
device_node *np,
if (IS_ENABLED(CONFIG_SWIOTLB))
swiotlb_set_io_tlb_default_mem(dev);
 
+   if (!iommu)
+   return of_dma_set_restricted_buffer(dev, np);
+
return 0;
 }
 EXPORT_SYMBOL_GPL(of_dma_configure_id);
diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h
index 631489f7f8c0..376462798f7e 100644
--- a/drivers/of/of_private.h
+++ b/drivers/of/of_private.h
@@ -163,12 +163,18 @@ struct bus_dma_region;
 #if defined(CONFIG_OF_ADDRESS) && defined(CONFIG_HAS_DMA)
 int of_dma_get_range(struct device_node *np,
const struct bus_dma_region **map);
+int of_dma_set_restricted_buffer(struct device *dev, struct device_node *np);
 #else
 static inline int of_dma_get_range(struct device_node *np,
const struct bus_dma_region **map)
 {
return -ENODEV;
 }
+static inline int of_dma_set_restricted_buffer(struct device *dev,
+  struct device_node *np)
+{
+   return -ENODEV;
+}
 #endif
 
 void fdt_init_reserved_mem(void);
-- 
2.32.0.272.g935e593368-goog



[PATCH v9 13/14] dt-bindings: of: Add restricted DMA pool

2021-06-11 Thread Claire Chang
Introduce the new compatible string, restricted-dma-pool, for restricted
DMA. One can specify the address and length of the restricted DMA memory
region by restricted-dma-pool in the reserved-memory node.

Signed-off-by: Claire Chang 
---
 .../reserved-memory/reserved-memory.txt   | 36 +--
 1 file changed, 33 insertions(+), 3 deletions(-)

diff --git 
a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt 
b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
index e8d3096d922c..46804f24df05 100644
--- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
+++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
@@ -51,6 +51,23 @@ compatible (optional) - standard definition
   used as a shared pool of DMA buffers for a set of devices. It can
   be used by an operating system to instantiate the necessary pool
   management subsystem if necessary.
+- restricted-dma-pool: This indicates a region of memory meant to be
+  used as a pool of restricted DMA buffers for a set of devices. The
+  memory region would be the only region accessible to those devices.
+  When using this, the no-map and reusable properties must not be set,
+  so the operating system can create a virtual mapping that will be 
used
+  for synchronization. The main purpose for restricted DMA is to
+  mitigate the lack of DMA access control on systems without an IOMMU,
+  which could result in the DMA accessing the system memory at
+  unexpected times and/or unexpected addresses, possibly leading to 
data
+  leakage or corruption. The feature on its own provides a basic level
+  of protection against the DMA overwriting buffer contents at
+  unexpected times. However, to protect against general data leakage 
and
+  system memory corruption, the system needs to provide way to lock 
down
+  the memory access, e.g., MPU. Note that since coherent allocation
+  needs remapping, one must set up another device coherent pool by
+  shared-dma-pool and use dma_alloc_from_dev_coherent instead for 
atomic
+  coherent allocation.
 - vendor specific string in the form ,[-]
 no-map (optional) - empty property
 - Indicates the operating system must not create a virtual mapping
@@ -85,10 +102,11 @@ memory-region-names (optional) - a list of names, one for 
each corresponding
 
 Example
 ---
-This example defines 3 contiguous regions are defined for Linux kernel:
+This example defines 4 contiguous regions for Linux kernel:
 one default of all device drivers (named linux,cma@7200 and 64MiB in size),
-one dedicated to the framebuffer device (named framebuffer@7800, 8MiB), and
-one for multimedia processing (named multimedia-memory@7700, 64MiB).
+one dedicated to the framebuffer device (named framebuffer@7800, 8MiB),
+one for multimedia processing (named multimedia-memory@7700, 64MiB), and
+one for restricted dma pool (named restricted_dma_reserved@0x5000, 64MiB).
 
 / {
#address-cells = <1>;
@@ -120,6 +138,11 @@ one for multimedia processing (named 
multimedia-memory@7700, 64MiB).
compatible = "acme,multimedia-memory";
reg = <0x7700 0x400>;
};
+
+   restricted_dma_reserved: restricted_dma_reserved {
+   compatible = "restricted-dma-pool";
+   reg = <0x5000 0x400>;
+   };
};
 
/* ... */
@@ -138,4 +161,11 @@ one for multimedia processing (named 
multimedia-memory@7700, 64MiB).
memory-region = <_reserved>;
/* ... */
};
+
+   pcie_device: pcie_device@0,0 {
+   reg = <0x8301 0x0 0x 0x0 0x0010
+  0x8301 0x0 0x0010 0x0 0x0010>;
+   memory-region = <_dma_mem_reserved>;
+   /* ... */
+   };
 };
-- 
2.32.0.272.g935e593368-goog



[PATCH v9 12/14] dma-direct: Allocate memory from restricted DMA pool if available

2021-06-11 Thread Claire Chang
The restricted DMA pool is preferred if available.

The restricted DMA pools provide a basic level of protection against the
DMA overwriting buffer contents at unexpected times. However, to protect
against general data leakage and system memory corruption, the system
needs to provide a way to lock down the memory access, e.g., MPU.

Note that since coherent allocation needs remapping, one must set up
another device coherent pool by shared-dma-pool and use
dma_alloc_from_dev_coherent instead for atomic coherent allocation.

Signed-off-by: Claire Chang 
---
 kernel/dma/direct.c | 37 -
 1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index eb4098323bbc..73fc4c659ba7 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -78,6 +78,9 @@ static bool dma_coherent_ok(struct device *dev, phys_addr_t 
phys, size_t size)
 static void __dma_direct_free_pages(struct device *dev, struct page *page,
size_t size)
 {
+   if (IS_ENABLED(CONFIG_DMA_RESTRICTED_POOL) &&
+   swiotlb_free(dev, page, size))
+   return;
dma_free_contiguous(dev, page, size);
 }
 
@@ -92,7 +95,17 @@ static struct page *__dma_direct_alloc_pages(struct device 
*dev, size_t size,
 
gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
   _limit);
-   page = dma_alloc_contiguous(dev, size, gfp);
+   if (IS_ENABLED(CONFIG_DMA_RESTRICTED_POOL)) {
+   page = swiotlb_alloc(dev, size);
+   if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
+   __dma_direct_free_pages(dev, page, size);
+   page = NULL;
+   }
+   return page;
+   }
+
+   if (!page)
+   page = dma_alloc_contiguous(dev, size, gfp);
if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
dma_free_contiguous(dev, page, size);
page = NULL;
@@ -148,7 +161,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
gfp |= __GFP_NOWARN;
 
if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
-   !force_dma_unencrypted(dev)) {
+   !force_dma_unencrypted(dev) && !is_dev_swiotlb_force(dev)) {
page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO);
if (!page)
return NULL;
@@ -161,18 +174,23 @@ void *dma_direct_alloc(struct device *dev, size_t size,
}
 
if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
-   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-   !dev_is_dma_coherent(dev))
+   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !dev_is_dma_coherent(dev) &&
+   !is_dev_swiotlb_force(dev))
return arch_dma_alloc(dev, size, dma_handle, gfp, attrs);
 
/*
 * Remapping or decrypting memory may block. If either is required and
 * we can't block, allocate the memory from the atomic pools.
+* If restricted DMA (i.e., is_dev_swiotlb_force) is required, one must
+* set up another device coherent pool by shared-dma-pool and use
+* dma_alloc_from_dev_coherent instead.
 */
if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
!gfpflags_allow_blocking(gfp) &&
(force_dma_unencrypted(dev) ||
-(IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && 
!dev_is_dma_coherent(dev
+(IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
+ !dev_is_dma_coherent(dev))) &&
+   !is_dev_swiotlb_force(dev))
return dma_direct_alloc_from_pool(dev, size, dma_handle, gfp);
 
/* we always manually zero the memory once we are done */
@@ -253,15 +271,15 @@ void dma_direct_free(struct device *dev, size_t size,
unsigned int page_order = get_order(size);
 
if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
-   !force_dma_unencrypted(dev)) {
+   !force_dma_unencrypted(dev) && !is_dev_swiotlb_force(dev)) {
/* cpu_addr is a struct page cookie, not a kernel address */
dma_free_contiguous(dev, cpu_addr, size);
return;
}
 
if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
-   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-   !dev_is_dma_coherent(dev)) {
+   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !dev_is_dma_coherent(dev) &&
+   !is_dev_swiotlb_force(dev)) {
arch_dma_free(dev, size, cpu_addr, dma_addr, attrs);
return;
}
@@ -289,7 +307,8 @@ struct page *dma_direct_alloc_pages(struct device *dev, 
size_t size,
void *ret;
 
if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
-   force_dma_unencrypted(dev) && !gfpflags_allow_blocking(gfp))
+   force_dma_unencrypted(dev) && !gfpflags_allow_blocking(gfp) &&
+   

[PATCH v9 11/14] swiotlb: Add restricted DMA alloc/free support.

2021-06-11 Thread Claire Chang
Add the functions, swiotlb_{alloc,free} to support the memory allocation
from restricted DMA pool.

Signed-off-by: Claire Chang 
---
 include/linux/swiotlb.h | 15 +++
 kernel/dma/swiotlb.c| 35 +--
 2 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 8200c100fe10..d3374497a4f8 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -162,4 +162,19 @@ static inline void swiotlb_adjust_size(unsigned long size)
 extern void swiotlb_print_info(void);
 extern void swiotlb_set_max_segment(unsigned int);
 
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+struct page *swiotlb_alloc(struct device *dev, size_t size);
+bool swiotlb_free(struct device *dev, struct page *page, size_t size);
+#else
+static inline struct page *swiotlb_alloc(struct device *dev, size_t size)
+{
+   return NULL;
+}
+static inline bool swiotlb_free(struct device *dev, struct page *page,
+   size_t size)
+{
+   return false;
+}
+#endif /* CONFIG_DMA_RESTRICTED_POOL */
+
 #endif /* __LINUX_SWIOTLB_H */
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index a6562573f090..0a19858da5b8 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -461,8 +461,9 @@ static int find_slots(struct device *dev, phys_addr_t 
orig_addr,
 
index = wrap = wrap_index(mem, ALIGN(mem->index, stride));
do {
-   if ((slot_addr(tbl_dma_addr, index) & iotlb_align_mask) !=
-   (orig_addr & iotlb_align_mask)) {
+   if (orig_addr &&
+   (slot_addr(tbl_dma_addr, index) & iotlb_align_mask) !=
+   (orig_addr & iotlb_align_mask)) {
index = wrap_index(mem, index + 1);
continue;
}
@@ -702,6 +703,36 @@ late_initcall(swiotlb_create_default_debugfs);
 #endif
 
 #ifdef CONFIG_DMA_RESTRICTED_POOL
+struct page *swiotlb_alloc(struct device *dev, size_t size)
+{
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
+   phys_addr_t tlb_addr;
+   int index;
+
+   if (!mem)
+   return NULL;
+
+   index = find_slots(dev, 0, size);
+   if (index == -1)
+   return NULL;
+
+   tlb_addr = slot_addr(mem->start, index);
+
+   return pfn_to_page(PFN_DOWN(tlb_addr));
+}
+
+bool swiotlb_free(struct device *dev, struct page *page, size_t size)
+{
+   phys_addr_t tlb_addr = page_to_phys(page);
+
+   if (!is_swiotlb_buffer(dev, tlb_addr))
+   return false;
+
+   release_slots(dev, tlb_addr);
+
+   return true;
+}
+
 static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
struct device *dev)
 {
-- 
2.32.0.272.g935e593368-goog



[PATCH v9 10/14] dma-direct: Add a new wrapper __dma_direct_free_pages()

2021-06-11 Thread Claire Chang
Add a new wrapper __dma_direct_free_pages() that will be useful later
for swiotlb_free().

Signed-off-by: Claire Chang 
---
 kernel/dma/direct.c | 14 ++
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 078f7087e466..eb4098323bbc 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -75,6 +75,12 @@ static bool dma_coherent_ok(struct device *dev, phys_addr_t 
phys, size_t size)
min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit);
 }
 
+static void __dma_direct_free_pages(struct device *dev, struct page *page,
+   size_t size)
+{
+   dma_free_contiguous(dev, page, size);
+}
+
 static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
gfp_t gfp)
 {
@@ -237,7 +243,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
return NULL;
}
 out_free_pages:
-   dma_free_contiguous(dev, page, size);
+   __dma_direct_free_pages(dev, page, size);
return NULL;
 }
 
@@ -273,7 +279,7 @@ void dma_direct_free(struct device *dev, size_t size,
else if (IS_ENABLED(CONFIG_ARCH_HAS_DMA_CLEAR_UNCACHED))
arch_dma_clear_uncached(cpu_addr, size);
 
-   dma_free_contiguous(dev, dma_direct_to_page(dev, dma_addr), size);
+   __dma_direct_free_pages(dev, dma_direct_to_page(dev, dma_addr), size);
 }
 
 struct page *dma_direct_alloc_pages(struct device *dev, size_t size,
@@ -310,7 +316,7 @@ struct page *dma_direct_alloc_pages(struct device *dev, 
size_t size,
*dma_handle = phys_to_dma_direct(dev, page_to_phys(page));
return page;
 out_free_pages:
-   dma_free_contiguous(dev, page, size);
+   __dma_direct_free_pages(dev, page, size);
return NULL;
 }
 
@@ -329,7 +335,7 @@ void dma_direct_free_pages(struct device *dev, size_t size,
if (force_dma_unencrypted(dev))
set_memory_encrypted((unsigned long)vaddr, 1 << page_order);
 
-   dma_free_contiguous(dev, page, size);
+   __dma_direct_free_pages(dev, page, size);
 }
 
 #if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
-- 
2.32.0.272.g935e593368-goog



[PATCH v9 09/14] swiotlb: Refactor swiotlb_tbl_unmap_single

2021-06-11 Thread Claire Chang
Add a new function, release_slots, to make the code reusable for supporting
different bounce buffer pools, e.g. restricted DMA pool.

Signed-off-by: Claire Chang 
---
 kernel/dma/swiotlb.c | 35 ---
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 364c6c822063..a6562573f090 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -554,27 +554,15 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
return tlb_addr;
 }
 
-/*
- * tlb_addr is the physical address of the bounce buffer to unmap.
- */
-void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
- size_t mapping_size, enum dma_data_direction dir,
- unsigned long attrs)
+static void release_slots(struct device *dev, phys_addr_t tlb_addr)
 {
-   struct io_tlb_mem *mem = hwdev->dma_io_tlb_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned long flags;
-   unsigned int offset = swiotlb_align_offset(hwdev, tlb_addr);
+   unsigned int offset = swiotlb_align_offset(dev, tlb_addr);
int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
int nslots = nr_slots(mem->slots[index].alloc_size + offset);
int count, i;
 
-   /*
-* First, sync the memory before unmapping the entry
-*/
-   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
-   (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
-   swiotlb_bounce(hwdev, tlb_addr, mapping_size, DMA_FROM_DEVICE);
-
/*
 * Return the buffer to the free list by setting the corresponding
 * entries to indicate the number of contiguous entries available.
@@ -609,6 +597,23 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
phys_addr_t tlb_addr,
spin_unlock_irqrestore(>lock, flags);
 }
 
+/*
+ * tlb_addr is the physical address of the bounce buffer to unmap.
+ */
+void swiotlb_tbl_unmap_single(struct device *dev, phys_addr_t tlb_addr,
+ size_t mapping_size, enum dma_data_direction dir,
+ unsigned long attrs)
+{
+   /*
+* First, sync the memory before unmapping the entry
+*/
+   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
+   (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
+   swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_FROM_DEVICE);
+
+   release_slots(dev, tlb_addr);
+}
+
 void swiotlb_sync_single_for_device(struct device *dev, phys_addr_t tlb_addr,
size_t size, enum dma_data_direction dir)
 {
-- 
2.32.0.272.g935e593368-goog



[PATCH v9 08/14] swiotlb: Move alloc_size to find_slots

2021-06-11 Thread Claire Chang
Move the maintenance of alloc_size to find_slots for better code
reusability later.

Signed-off-by: Claire Chang 
---
 kernel/dma/swiotlb.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index e5ccc198d0a7..364c6c822063 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -486,8 +486,11 @@ static int find_slots(struct device *dev, phys_addr_t 
orig_addr,
return -1;
 
 found:
-   for (i = index; i < index + nslots; i++)
+   for (i = index; i < index + nslots; i++) {
mem->slots[i].list = 0;
+   mem->slots[i].alloc_size =
+   alloc_size - ((i - index) << IO_TLB_SHIFT);
+   }
for (i = index - 1;
 io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
 mem->slots[i].list; i--)
@@ -542,11 +545,8 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
 * This is needed when we sync the memory.  Then we sync the buffer if
 * needed.
 */
-   for (i = 0; i < nr_slots(alloc_size + offset); i++) {
+   for (i = 0; i < nr_slots(alloc_size + offset); i++)
mem->slots[index + i].orig_addr = slot_addr(orig_addr, i);
-   mem->slots[index + i].alloc_size =
-   alloc_size - (i << IO_TLB_SHIFT);
-   }
tlb_addr = slot_addr(mem->start, index) + offset;
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
-- 
2.32.0.272.g935e593368-goog



[PATCH v9 07/14] swiotlb: Bounce data from/to restricted DMA pool if available

2021-06-11 Thread Claire Chang
Regardless of swiotlb setting, the restricted DMA pool is preferred if
available.

The restricted DMA pools provide a basic level of protection against the
DMA overwriting buffer contents at unexpected times. However, to protect
against general data leakage and system memory corruption, the system
needs to provide a way to lock down the memory access, e.g., MPU.

Note that is_dev_swiotlb_force doesn't check if
swiotlb_force == SWIOTLB_FORCE. Otherwise the memory allocation behavior
with default swiotlb will be changed by the following patche
("dma-direct: Allocate memory from restricted DMA pool if available").

Signed-off-by: Claire Chang 
---
 include/linux/swiotlb.h | 10 +-
 kernel/dma/direct.c |  3 ++-
 kernel/dma/direct.h |  3 ++-
 kernel/dma/swiotlb.c|  1 +
 4 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 06cf17a80f5c..8200c100fe10 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -85,6 +85,7 @@ extern enum swiotlb_force swiotlb_force;
  * unmap calls.
  * @debugfs:   The dentry to debugfs.
  * @late_alloc:%true if allocated using the page allocator
+ * @force_swiotlb: %true if swiotlb is forced
  */
 struct io_tlb_mem {
phys_addr_t start;
@@ -95,6 +96,7 @@ struct io_tlb_mem {
spinlock_t lock;
struct dentry *debugfs;
bool late_alloc;
+   bool force_swiotlb;
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
@@ -115,6 +117,11 @@ static inline void swiotlb_set_io_tlb_default_mem(struct 
device *dev)
dev->dma_io_tlb_mem = io_tlb_default_mem;
 }
 
+static inline bool is_dev_swiotlb_force(struct device *dev)
+{
+   return dev->dma_io_tlb_mem->force_swiotlb;
+}
+
 void __init swiotlb_exit(void);
 unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
@@ -126,8 +133,9 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
 {
return false;
 }
-static inline void swiotlb_set_io_tlb_default_mem(struct device *dev)
+static inline bool is_dev_swiotlb_force(struct device *dev)
 {
+   return false;
 }
 static inline void swiotlb_exit(void)
 {
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 7a88c34d0867..078f7087e466 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -496,7 +496,8 @@ size_t dma_direct_max_mapping_size(struct device *dev)
 {
/* If SWIOTLB is active, use its maximum mapping size */
if (is_swiotlb_active(dev) &&
-   (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))
+   (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE ||
+is_dev_swiotlb_force(dev)))
return swiotlb_max_mapping_size(dev);
return SIZE_MAX;
 }
diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index 13e9e7158d94..f94813674e23 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -87,7 +87,8 @@ static inline dma_addr_t dma_direct_map_page(struct device 
*dev,
phys_addr_t phys = page_to_phys(page) + offset;
dma_addr_t dma_addr = phys_to_dma(dev, phys);
 
-   if (unlikely(swiotlb_force == SWIOTLB_FORCE))
+   if (unlikely(swiotlb_force == SWIOTLB_FORCE) ||
+   is_dev_swiotlb_force(dev))
return swiotlb_map(dev, phys, size, dir, attrs);
 
if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 21e99907edd6..e5ccc198d0a7 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -714,6 +714,7 @@ static int rmem_swiotlb_device_init(struct reserved_mem 
*rmem,
return -ENOMEM;
 
swiotlb_init_io_tlb_mem(mem, rmem->base, nslabs, false, true);
+   mem->force_swiotlb = true;
 
rmem->priv = mem;
 
-- 
2.32.0.272.g935e593368-goog



[PATCH v9 06/14] swiotlb: Update is_swiotlb_active to add a struct device argument

2021-06-11 Thread Claire Chang
Update is_swiotlb_active to add a struct device argument. This will be
useful later to allow for restricted DMA pool.

Signed-off-by: Claire Chang 
---
 drivers/gpu/drm/i915/gem/i915_gem_internal.c | 2 +-
 drivers/gpu/drm/nouveau/nouveau_ttm.c| 2 +-
 drivers/pci/xen-pcifront.c   | 2 +-
 include/linux/swiotlb.h  | 4 ++--
 kernel/dma/direct.c  | 2 +-
 kernel/dma/swiotlb.c | 4 ++--
 6 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c 
b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
index ce6b664b10aa..89a894354263 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
@@ -42,7 +42,7 @@ static int i915_gem_object_get_pages_internal(struct 
drm_i915_gem_object *obj)
 
max_order = MAX_ORDER;
 #ifdef CONFIG_SWIOTLB
-   if (is_swiotlb_active()) {
+   if (is_swiotlb_active(obj->base.dev->dev)) {
unsigned int max_segment;
 
max_segment = swiotlb_max_segment();
diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c 
b/drivers/gpu/drm/nouveau/nouveau_ttm.c
index f4c2e46b6fe1..2ca9d9a9e5d5 100644
--- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
@@ -276,7 +276,7 @@ nouveau_ttm_init(struct nouveau_drm *drm)
}
 
 #if IS_ENABLED(CONFIG_SWIOTLB) && IS_ENABLED(CONFIG_X86)
-   need_swiotlb = is_swiotlb_active();
+   need_swiotlb = is_swiotlb_active(dev->dev);
 #endif
 
ret = ttm_device_init(>ttm.bdev, _bo_driver, drm->dev->dev,
diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c
index b7a8f3a1921f..0d56985bfe81 100644
--- a/drivers/pci/xen-pcifront.c
+++ b/drivers/pci/xen-pcifront.c
@@ -693,7 +693,7 @@ static int pcifront_connect_and_init_dma(struct 
pcifront_device *pdev)
 
spin_unlock(_dev_lock);
 
-   if (!err && !is_swiotlb_active()) {
+   if (!err && !is_swiotlb_active(>xdev->dev)) {
err = pci_xen_swiotlb_init_late();
if (err)
dev_err(>xdev->dev, "Could not setup SWIOTLB!\n");
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 921b469c6ad2..06cf17a80f5c 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -118,7 +118,7 @@ static inline void swiotlb_set_io_tlb_default_mem(struct 
device *dev)
 void __init swiotlb_exit(void);
 unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
-bool is_swiotlb_active(void);
+bool is_swiotlb_active(struct device *dev);
 void __init swiotlb_adjust_size(unsigned long size);
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
@@ -141,7 +141,7 @@ static inline size_t swiotlb_max_mapping_size(struct device 
*dev)
return SIZE_MAX;
 }
 
-static inline bool is_swiotlb_active(void)
+static inline bool is_swiotlb_active(struct device *dev)
 {
return false;
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 84c9feb5474a..7a88c34d0867 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -495,7 +495,7 @@ int dma_direct_supported(struct device *dev, u64 mask)
 size_t dma_direct_max_mapping_size(struct device *dev)
 {
/* If SWIOTLB is active, use its maximum mapping size */
-   if (is_swiotlb_active() &&
+   if (is_swiotlb_active(dev) &&
(dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))
return swiotlb_max_mapping_size(dev);
return SIZE_MAX;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index c4a071d6a63f..21e99907edd6 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -666,9 +666,9 @@ size_t swiotlb_max_mapping_size(struct device *dev)
return ((size_t)IO_TLB_SIZE) * IO_TLB_SEGSIZE;
 }
 
-bool is_swiotlb_active(void)
+bool is_swiotlb_active(struct device *dev)
 {
-   return io_tlb_default_mem != NULL;
+   return dev->dma_io_tlb_mem != NULL;
 }
 EXPORT_SYMBOL_GPL(is_swiotlb_active);
 
-- 
2.32.0.272.g935e593368-goog



[PATCH v9 05/14] swiotlb: Update is_swiotlb_buffer to add a struct device argument

2021-06-11 Thread Claire Chang
Update is_swiotlb_buffer to add a struct device argument. This will be
useful later to allow for restricted DMA pool.

Signed-off-by: Claire Chang 
---
 drivers/iommu/dma-iommu.c | 12 ++--
 drivers/xen/swiotlb-xen.c |  2 +-
 include/linux/swiotlb.h   |  7 ---
 kernel/dma/direct.c   |  6 +++---
 kernel/dma/direct.h   |  6 +++---
 5 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 5d96fcc45fec..1a6a08908245 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -506,7 +506,7 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, 
dma_addr_t dma_addr,
 
__iommu_dma_unmap(dev, dma_addr, size);
 
-   if (unlikely(is_swiotlb_buffer(phys)))
+   if (unlikely(is_swiotlb_buffer(dev, phys)))
swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
 }
 
@@ -577,7 +577,7 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device 
*dev, phys_addr_t phys,
}
 
iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
-   if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
+   if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(dev, phys))
swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs);
return iova;
 }
@@ -783,7 +783,7 @@ static void iommu_dma_sync_single_for_cpu(struct device 
*dev,
if (!dev_is_dma_coherent(dev))
arch_sync_dma_for_cpu(phys, size, dir);
 
-   if (is_swiotlb_buffer(phys))
+   if (is_swiotlb_buffer(dev, phys))
swiotlb_sync_single_for_cpu(dev, phys, size, dir);
 }
 
@@ -796,7 +796,7 @@ static void iommu_dma_sync_single_for_device(struct device 
*dev,
return;
 
phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
-   if (is_swiotlb_buffer(phys))
+   if (is_swiotlb_buffer(dev, phys))
swiotlb_sync_single_for_device(dev, phys, size, dir);
 
if (!dev_is_dma_coherent(dev))
@@ -817,7 +817,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
if (!dev_is_dma_coherent(dev))
arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
 
-   if (is_swiotlb_buffer(sg_phys(sg)))
+   if (is_swiotlb_buffer(dev, sg_phys(sg)))
swiotlb_sync_single_for_cpu(dev, sg_phys(sg),
sg->length, dir);
}
@@ -834,7 +834,7 @@ static void iommu_dma_sync_sg_for_device(struct device *dev,
return;
 
for_each_sg(sgl, sg, nelems, i) {
-   if (is_swiotlb_buffer(sg_phys(sg)))
+   if (is_swiotlb_buffer(dev, sg_phys(sg)))
swiotlb_sync_single_for_device(dev, sg_phys(sg),
   sg->length, dir);
 
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 24d11861ac7d..0c4fb34f11ab 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -100,7 +100,7 @@ static int is_xen_swiotlb_buffer(struct device *dev, 
dma_addr_t dma_addr)
 * in our domain. Therefore _only_ check address within our domain.
 */
if (pfn_valid(PFN_DOWN(paddr)))
-   return is_swiotlb_buffer(paddr);
+   return is_swiotlb_buffer(dev, paddr);
return 0;
 }
 
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index ec0c01796c8a..921b469c6ad2 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -2,6 +2,7 @@
 #ifndef __LINUX_SWIOTLB_H
 #define __LINUX_SWIOTLB_H
 
+#include 
 #include 
 #include 
 #include 
@@ -102,9 +103,9 @@ struct io_tlb_mem {
 };
 extern struct io_tlb_mem *io_tlb_default_mem;
 
-static inline bool is_swiotlb_buffer(phys_addr_t paddr)
+static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
 
return mem && paddr >= mem->start && paddr < mem->end;
 }
@@ -121,7 +122,7 @@ bool is_swiotlb_active(void);
 void __init swiotlb_adjust_size(unsigned long size);
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
-static inline bool is_swiotlb_buffer(phys_addr_t paddr)
+static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
return false;
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index f737e3347059..84c9feb5474a 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -343,7 +343,7 @@ void dma_direct_sync_sg_for_device(struct device *dev,
for_each_sg(sgl, sg, nents, i) {
phys_addr_t paddr = dma_to_phys(dev, sg_dma_address(sg));
 
-   if (unlikely(is_swiotlb_buffer(paddr)))
+   if (unlikely(is_swiotlb_buffer(dev, paddr)))
swiotlb_sync_single_for_device(dev, paddr, sg->length,

[PATCH v9 04/14] swiotlb: Add restricted DMA pool initialization

2021-06-11 Thread Claire Chang
Add the initialization function to create restricted DMA pools from
matching reserved-memory nodes.

Signed-off-by: Claire Chang 
---
 include/linux/swiotlb.h |  3 +-
 kernel/dma/Kconfig  | 14 
 kernel/dma/swiotlb.c| 75 +
 3 files changed, 91 insertions(+), 1 deletion(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 008125ccd509..ec0c01796c8a 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -72,7 +72,8 @@ extern enum swiotlb_force swiotlb_force;
  * range check to see if the memory was in fact allocated by this
  * API.
  * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
- * @end. This is command line adjustable via setup_io_tlb_npages.
+ * @end. For default swiotlb, this is command line adjustable via
+ * setup_io_tlb_npages.
  * @used:  The number of used IO TLB block.
  * @list:  The free list describing the number of free entries available
  * from each index.
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 77b405508743..3e961dc39634 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -80,6 +80,20 @@ config SWIOTLB
bool
select NEED_DMA_MAP_STATE
 
+config DMA_RESTRICTED_POOL
+   bool "DMA Restricted Pool"
+   depends on OF && OF_RESERVED_MEM
+   select SWIOTLB
+   help
+ This enables support for restricted DMA pools which provide a level of
+ DMA memory protection on systems with limited hardware protection
+ capabilities, such as those lacking an IOMMU.
+
+ For more information see
+ 

+ and .
+ If unsure, say "n".
+
 #
 # Should be selected if we can mmap non-coherent mappings to userspace.
 # The only thing that is really required is a way to set an uncached bit
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 29b950ab1351..c4a071d6a63f 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -39,6 +39,13 @@
 #ifdef CONFIG_DEBUG_FS
 #include 
 #endif
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+#include 
+#include 
+#include 
+#include 
+#include 
+#endif
 
 #include 
 #include 
@@ -688,3 +695,71 @@ static int __init swiotlb_create_default_debugfs(void)
 late_initcall(swiotlb_create_default_debugfs);
 
 #endif
+
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
+   struct device *dev)
+{
+   struct io_tlb_mem *mem = rmem->priv;
+   unsigned long nslabs = rmem->size >> IO_TLB_SHIFT;
+
+   /*
+* Since multiple devices can share the same pool, the private data,
+* io_tlb_mem struct, will be initialized by the first device attached
+* to it.
+*/
+   if (!mem) {
+   mem = kzalloc(struct_size(mem, slots, nslabs), GFP_KERNEL);
+   if (!mem)
+   return -ENOMEM;
+
+   swiotlb_init_io_tlb_mem(mem, rmem->base, nslabs, false, true);
+
+   rmem->priv = mem;
+
+   if (IS_ENABLED(CONFIG_DEBUG_FS)) {
+   mem->debugfs =
+   debugfs_create_dir(rmem->name, debugfs_dir);
+   swiotlb_create_debugfs_files(mem);
+   }
+   }
+
+   dev->dma_io_tlb_mem = mem;
+
+   return 0;
+}
+
+static void rmem_swiotlb_device_release(struct reserved_mem *rmem,
+   struct device *dev)
+{
+   dev->dma_io_tlb_mem = io_tlb_default_mem;
+}
+
+static const struct reserved_mem_ops rmem_swiotlb_ops = {
+   .device_init = rmem_swiotlb_device_init,
+   .device_release = rmem_swiotlb_device_release,
+};
+
+static int __init rmem_swiotlb_setup(struct reserved_mem *rmem)
+{
+   unsigned long node = rmem->fdt_node;
+
+   if (of_get_flat_dt_prop(node, "reusable", NULL) ||
+   of_get_flat_dt_prop(node, "linux,cma-default", NULL) ||
+   of_get_flat_dt_prop(node, "linux,dma-default", NULL) ||
+   of_get_flat_dt_prop(node, "no-map", NULL))
+   return -EINVAL;
+
+   if (PageHighMem(pfn_to_page(PHYS_PFN(rmem->base {
+   pr_err("Restricted DMA pool must be accessible within the 
linear mapping.");
+   return -EINVAL;
+   }
+
+   rmem->ops = _swiotlb_ops;
+   pr_info("Reserved memory: created restricted DMA pool at %pa, size %ld 
MiB\n",
+   >base, (unsigned long)rmem->size / SZ_1M);
+   return 0;
+}
+
+RESERVEDMEM_OF_DECLARE(dma, "restricted-dma-pool", rmem_swiotlb_setup);
+#endif /* CONFIG_DMA_RESTRICTED_POOL */
-- 
2.32.0.272.g935e593368-goog



[PATCH v9 03/14] swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used

2021-06-11 Thread Claire Chang
Always have the pointer to the swiotlb pool used in struct device. This
could help simplify the code for other pools.

Signed-off-by: Claire Chang 
---
 drivers/of/device.c | 3 +++
 include/linux/device.h  | 4 
 include/linux/swiotlb.h | 8 
 kernel/dma/swiotlb.c| 8 
 4 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/of/device.c b/drivers/of/device.c
index c5a9473a5fb1..1defdf15ba95 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -165,6 +165,9 @@ int of_dma_configure_id(struct device *dev, struct 
device_node *np,
 
arch_setup_dma_ops(dev, dma_start, size, iommu, coherent);
 
+   if (IS_ENABLED(CONFIG_SWIOTLB))
+   swiotlb_set_io_tlb_default_mem(dev);
+
return 0;
 }
 EXPORT_SYMBOL_GPL(of_dma_configure_id);
diff --git a/include/linux/device.h b/include/linux/device.h
index 4443e12238a0..2e9a378c9100 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -432,6 +432,7 @@ struct dev_links_info {
  * @dma_pools: Dma pools (if dma'ble device).
  * @dma_mem:   Internal for coherent mem override.
  * @cma_area:  Contiguous memory area for dma allocations
+ * @dma_io_tlb_mem: Pointer to the swiotlb pool used.  Not for driver use.
  * @archdata:  For arch-specific additions.
  * @of_node:   Associated device tree node.
  * @fwnode:Associated device node supplied by platform firmware.
@@ -540,6 +541,9 @@ struct device {
 #ifdef CONFIG_DMA_CMA
struct cma *cma_area;   /* contiguous memory area for dma
   allocations */
+#endif
+#ifdef CONFIG_SWIOTLB
+   struct io_tlb_mem *dma_io_tlb_mem;
 #endif
/* arch specific additions */
struct dev_archdata archdata;
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 216854a5e513..008125ccd509 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -108,6 +108,11 @@ static inline bool is_swiotlb_buffer(phys_addr_t paddr)
return mem && paddr >= mem->start && paddr < mem->end;
 }
 
+static inline void swiotlb_set_io_tlb_default_mem(struct device *dev)
+{
+   dev->dma_io_tlb_mem = io_tlb_default_mem;
+}
+
 void __init swiotlb_exit(void);
 unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
@@ -119,6 +124,9 @@ static inline bool is_swiotlb_buffer(phys_addr_t paddr)
 {
return false;
 }
+static inline void swiotlb_set_io_tlb_default_mem(struct device *dev)
+{
+}
 static inline void swiotlb_exit(void)
 {
 }
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8a3e2b3b246d..29b950ab1351 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -344,7 +344,7 @@ void __init swiotlb_exit(void)
 static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t 
size,
   enum dma_data_direction dir)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
phys_addr_t orig_addr = mem->slots[index].orig_addr;
size_t alloc_size = mem->slots[index].alloc_size;
@@ -426,7 +426,7 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, 
unsigned int index)
 static int find_slots(struct device *dev, phys_addr_t orig_addr,
size_t alloc_size)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned long boundary_mask = dma_get_seg_boundary(dev);
dma_addr_t tbl_dma_addr =
phys_to_dma_unencrypted(dev, mem->start) & boundary_mask;
@@ -503,7 +503,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
size_t mapping_size, size_t alloc_size,
enum dma_data_direction dir, unsigned long attrs)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned int offset = swiotlb_align_offset(dev, orig_addr);
unsigned int i;
int index;
@@ -554,7 +554,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
phys_addr_t tlb_addr,
  size_t mapping_size, enum dma_data_direction dir,
  unsigned long attrs)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = hwdev->dma_io_tlb_mem;
unsigned long flags;
unsigned int offset = swiotlb_align_offset(hwdev, tlb_addr);
int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
-- 
2.32.0.272.g935e593368-goog



[PATCH v9 02/14] swiotlb: Refactor swiotlb_create_debugfs

2021-06-11 Thread Claire Chang
Split the debugfs creation to make the code reusable for supporting
different bounce buffer pools, e.g. restricted DMA pool.

Signed-off-by: Claire Chang 
---
 kernel/dma/swiotlb.c | 23 ---
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 1a1208c81e85..8a3e2b3b246d 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -64,6 +64,9 @@
 enum swiotlb_force swiotlb_force;
 
 struct io_tlb_mem *io_tlb_default_mem;
+#ifdef CONFIG_DEBUG_FS
+static struct dentry *debugfs_dir;
+#endif
 
 /*
  * Max segment that we can provide which (if pages are contingous) will
@@ -664,18 +667,24 @@ EXPORT_SYMBOL_GPL(is_swiotlb_active);
 
 #ifdef CONFIG_DEBUG_FS
 
-static int __init swiotlb_create_debugfs(void)
+static void swiotlb_create_debugfs_files(struct io_tlb_mem *mem)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
-
-   if (!mem)
-   return 0;
-   mem->debugfs = debugfs_create_dir("swiotlb", NULL);
debugfs_create_ulong("io_tlb_nslabs", 0400, mem->debugfs, >nslabs);
debugfs_create_ulong("io_tlb_used", 0400, mem->debugfs, >used);
+}
+
+static int __init swiotlb_create_default_debugfs(void)
+{
+   struct io_tlb_mem *mem = io_tlb_default_mem;
+
+   debugfs_dir = debugfs_create_dir("swiotlb", NULL);
+   if (mem) {
+   mem->debugfs = debugfs_dir;
+   swiotlb_create_debugfs_files(mem);
+   }
return 0;
 }
 
-late_initcall(swiotlb_create_debugfs);
+late_initcall(swiotlb_create_default_debugfs);
 
 #endif
-- 
2.32.0.272.g935e593368-goog



[PATCH v9 01/14] swiotlb: Refactor swiotlb init functions

2021-06-11 Thread Claire Chang
Add a new function, swiotlb_init_io_tlb_mem, for the io_tlb_mem struct
initialization to make the code reusable.

Signed-off-by: Claire Chang 
---
 kernel/dma/swiotlb.c | 53 ++--
 1 file changed, 27 insertions(+), 26 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8ca7d505d61c..1a1208c81e85 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -168,9 +168,32 @@ void __init swiotlb_update_mem_attributes(void)
memset(vaddr, 0, bytes);
 }
 
-int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
+static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
+   unsigned long nslabs, bool late_alloc,
+   bool memory_decrypted)
 {
+   void *vaddr = phys_to_virt(start);
unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
+
+   mem->nslabs = nslabs;
+   mem->start = start;
+   mem->end = mem->start + bytes;
+   mem->index = 0;
+   mem->late_alloc = late_alloc;
+   spin_lock_init(>lock);
+   for (i = 0; i < mem->nslabs; i++) {
+   mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
+   mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
+   mem->slots[i].alloc_size = 0;
+   }
+
+   if (memory_decrypted)
+   set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
+   memset(vaddr, 0, bytes);
+}
+
+int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
+{
struct io_tlb_mem *mem;
size_t alloc_size;
 
@@ -186,16 +209,8 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long 
nslabs, int verbose)
if (!mem)
panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
  __func__, alloc_size, PAGE_SIZE);
-   mem->nslabs = nslabs;
-   mem->start = __pa(tlb);
-   mem->end = mem->start + bytes;
-   mem->index = 0;
-   spin_lock_init(>lock);
-   for (i = 0; i < mem->nslabs; i++) {
-   mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
-   mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
-   mem->slots[i].alloc_size = 0;
-   }
+
+   swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false, false);
 
io_tlb_default_mem = mem;
if (verbose)
@@ -282,7 +297,6 @@ swiotlb_late_init_with_default_size(size_t default_size)
 int
 swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
 {
-   unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
struct io_tlb_mem *mem;
 
if (swiotlb_force == SWIOTLB_NO_FORCE)
@@ -297,20 +311,7 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
if (!mem)
return -ENOMEM;
 
-   mem->nslabs = nslabs;
-   mem->start = virt_to_phys(tlb);
-   mem->end = mem->start + bytes;
-   mem->index = 0;
-   mem->late_alloc = 1;
-   spin_lock_init(>lock);
-   for (i = 0; i < mem->nslabs; i++) {
-   mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
-   mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
-   mem->slots[i].alloc_size = 0;
-   }
-
-   set_memory_decrypted((unsigned long)tlb, bytes >> PAGE_SHIFT);
-   memset(tlb, 0, bytes);
+   swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true, true);
 
io_tlb_default_mem = mem;
swiotlb_print_info();
-- 
2.32.0.272.g935e593368-goog



[PATCH v9 00/14] Restricted DMA

2021-06-11 Thread Claire Chang
This series implements mitigations for lack of DMA access control on
systems without an IOMMU, which could result in the DMA accessing the
system memory at unexpected times and/or unexpected addresses, possibly
leading to data leakage or corruption.

For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
not behind an IOMMU. As PCI-e, by design, gives the device full access to
system memory, a vulnerability in the Wi-Fi firmware could easily escalate
to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
full chain of exploits; [2], [3]).

To mitigate the security concerns, we introduce restricted DMA. Restricted
DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a
specially allocated region and does memory allocation from the same region.
The feature on its own provides a basic level of protection against the DMA
overwriting buffer contents at unexpected times. However, to protect
against general data leakage and system memory corruption, the system needs
to provide a way to restrict the DMA to a predefined memory region (this is
usually done at firmware level, e.g. MPU in ATF on some ARM platforms [4]).

[1a] 
https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html
[1b] 
https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html
[2] https://blade.tencent.com/en/advisories/qualpwn/
[3] 
https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/
[4] 
https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132

v9:
Address the comments in v7 to
  - set swiotlb active pool to dev->dma_io_tlb_mem
  - get rid of get_io_tlb_mem
  - dig out the device struct for is_swiotlb_active
  - move debugfs_create_dir out of swiotlb_create_debugfs
  - do set_memory_decrypted conditionally in swiotlb_init_io_tlb_mem
  - use IS_ENABLED in kernel/dma/direct.c
  - fix redefinition of 'of_dma_set_restricted_buffer'

v8:
- Fix reserved-memory.txt and add the reg property in example.
- Fix sizeof for of_property_count_elems_of_size in
  drivers/of/address.c#of_dma_set_restricted_buffer.
- Apply Will's suggestion to try the OF node having DMA configuration in
  drivers/of/address.c#of_dma_set_restricted_buffer.
- Fix typo in the comment of drivers/of/address.c#of_dma_set_restricted_buffer.
- Add error message for PageHighMem in
  kernel/dma/swiotlb.c#rmem_swiotlb_device_init and move it to
  rmem_swiotlb_setup.
- Fix the message string in rmem_swiotlb_setup.
https://lore.kernel.org/patchwork/cover/1437112/

v7:
Fix debugfs, PageHighMem and comment style in rmem_swiotlb_device_init
https://lore.kernel.org/patchwork/cover/1431031/

v6:
Address the comments in v5
https://lore.kernel.org/patchwork/cover/1423201/

v5:
Rebase on latest linux-next
https://lore.kernel.org/patchwork/cover/1416899/

v4:
- Fix spinlock bad magic
- Use rmem->name for debugfs entry
- Address the comments in v3
https://lore.kernel.org/patchwork/cover/1378113/

v3:
Using only one reserved memory region for both streaming DMA and memory
allocation.
https://lore.kernel.org/patchwork/cover/1360992/

v2:
Building on top of swiotlb.
https://lore.kernel.org/patchwork/cover/1280705/

v1:
Using dma_map_ops.
https://lore.kernel.org/patchwork/cover/1271660/


Claire Chang (14):
  swiotlb: Refactor swiotlb init functions
  swiotlb: Refactor swiotlb_create_debugfs
  swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used
  swiotlb: Add restricted DMA pool initialization
  swiotlb: Update is_swiotlb_buffer to add a struct device argument
  swiotlb: Update is_swiotlb_active to add a struct device argument
  swiotlb: Bounce data from/to restricted DMA pool if available
  swiotlb: Move alloc_size to find_slots
  swiotlb: Refactor swiotlb_tbl_unmap_single
  dma-direct: Add a new wrapper __dma_direct_free_pages()
  swiotlb: Add restricted DMA alloc/free support.
  dma-direct: Allocate memory from restricted DMA pool if available
  dt-bindings: of: Add restricted DMA pool
  of: Add plumbing for restricted DMA pool

 .../reserved-memory/reserved-memory.txt   |  36 ++-
 drivers/gpu/drm/i915/gem/i915_gem_internal.c  |   2 +-
 drivers/gpu/drm/nouveau/nouveau_ttm.c |   2 +-
 drivers/iommu/dma-iommu.c |  12 +-
 drivers/of/address.c  |  33 +++
 drivers/of/device.c   |   6 +
 drivers/of/of_private.h   |   6 +
 drivers/pci/xen-pcifront.c|   2 +-
 drivers/xen/swiotlb-xen.c |   2 +-
 include/linux/device.h|   4 +
 include/linux/swiotlb.h   |  45 +++-
 kernel/dma/Kconfig|  14 +
 kernel/dma/direct.c   |  62 +++--
 kernel/dma/direct.h   |   9 +-
 kernel/dma/swiotlb.c  | 242 

Re: [PATCH 0/1] PPC32: fix ptrace() access to FPU registers

2021-06-11 Thread Radu Rendec
On Fri, 2021-06-11 at 08:02 +0200, Christophe Leroy wrote:
>Le 19/06/2019 à 14:57, Radu Rendec a écrit :
>> On Wed, 2019-06-19 at 10:36 +1000, Daniel Axtens wrote:
>>> Andreas Schwab <
>>> sch...@linux-m68k.org
 writes:
>>>
 On Jun 18 2019, Radu Rendec <
 radu.ren...@gmail.com
> wrote:

> Since you already have a working setup, it would be nice if you could
> add a printk to arch_ptrace() to print the address and confirm what I
> believe happens (by reading the gdb source code).

 A ppc32 ptrace syscall goes through compat_arch_ptrace.
>>
>> Right. I completely overlooked that part.
>>
>>> Ah right, and that (in ptrace32.c) contains code that will work:
>>>
>>>
>>> /*
>>>  * the user space code considers the floating point
>>>  * to be an array of unsigned int (32 bits) - the
>>>  * index passed in is based on this assumption.
>>>  */
>>> tmp = ((unsigned int *)child->thread.fp_state.fpr)
>>> [FPRINDEX(index)];
>>>
>>> FPRINDEX is defined above to deal with the various manipulations you
>>> need to do.
>>
>> Correct. Basically it does the same that I did in my patch: it divides
>> the index again by 2 (it's already divided by 4 in compat_arch_ptrace()
>> so it ends up divided by 8), then takes the least significant bit and
>> adds it to the index. I take bit 2 of the original address, which is the
>> same thing (because in FPRHALF() the address is already divided by 4).
>>
>> So we have this in ptrace32.c:
>>
>> #define FPRNUMBER(i) (((i) - PT_FPR0) >> 1)
>> #define FPRHALF(i) (((i) - PT_FPR0) & 1)
>> #define FPRINDEX(i) TS_FPRWIDTH * FPRNUMBER(i) * 2 + FPRHALF(i)
>>
>> index = (unsigned long) addr >> 2;
>> (unsigned int *)child->thread.fp_state.fpr)[FPRINDEX(index)]
>>
>>
>> And we have this in my patch:
>>
>> fpidx = (addr - PT_FPR0 * sizeof(long)) / 8;
>> (void *)>thread.TS_FPR(fpidx) + (addr & 4)
>>
>>> Radu: I think we want to copy that working code back into ptrace.c.
>>
>> I'm not sure that would work. There's a subtle difference: the code in
>> ptrace32.c is always compiled on a 64-bit kernel and the user space
>> calling it is always 32-bit; on the other hand, the code in ptrace.c can
>> be compiled on either a 64-bit kernel or a 32-bit kernel and the user
>> space calling it always has the same "bitness" as the kernel.
>>
>> One difference is the size of the CPU registers. On 64-bit they are 8
>> byte long and user space knows that and generates 8-byte aligned
>> addresses. So you have to divide the address by 8 to calculate the CPU
>> register index correctly, which compat_arch_ptrace() currently doesn't.
>>
>> Another difference is that on 64-bit `long` is 8 bytes, so user space
>> can read a whole FPU register in a single ptrace call.
>>
>> Now that we are all aware of compat_arch_ptrace() (which handles the
>> special case of a 32-bit process running on a 64-bit kernel) I would say
>> the patch is correct and does the right thing for both 32-bit and 64-bit
>> kernels and processes.
>>
>>> The challenge will be unpicking the awful mess of ifdefs in ptrace.c
>>> and making it somewhat more comprehensible.
>>
>> I'm not sure what ifdefs you're thinking about. The only that are used
>> inside arch_ptrace() are PT_FPR0, PT_FPSCR and TS_FPR, which seem to be
>> correct.
>>
>> But perhaps it would be useful to change my patch and add a comment just
>> before arch_ptrace() that explains how the math is done and that the
>> code must work on both 32-bit and 64-bit, the user space address
>> assumptions, etc.
>>
>> By the way, I'm not sure the code in compat_arch_ptrace() handles
>> PT_FPSCR correctly. It might (just because fpscr is right next to fpr[]
>> in memory - and that's a hack), but I can't figure out if it accesses
>> the right half.
>>
>
>Does the issue still exists ? If yes, the patch has to be rebased.

Hard to say. I'm still using 4.9 (stable) on the systems that I created
the patch for. I tried to rebase, and the patch no longer applies. It
looks like there have been some changes around that area, notably your
commit e009fa433542, so it could actually be fixed now.

It's been exactly two years since I sent the patch and I don't remember
all the details. I will have to go back and look. Also, running a recent
kernel on my PPC32 systems is not an option because there are a bunch of
custom patches that would have to be ported. I will try in a VM and get
back to you, hopefully early next week.

Best regards,
Radu



Re: [PATCH] btrfs: Disable BTRFS on platforms having 256K pages

2021-06-11 Thread David Sterba
On Fri, Jun 11, 2021 at 12:58:58PM +, Chris Mason wrote:
> > On Jun 10, 2021, at 12:20 PM, David Sterba  wrote:
> > On Thu, Jun 10, 2021 at 04:50:09PM +0200, Christophe Leroy wrote:
> >> Le 10/06/2021 à 15:54, Chris Mason a écrit :
>  On Jun 10, 2021, at 1:23 AM, Christophe Leroy 
>   wrote:
> > And there's no such thing like "just bump BTRFS_MAX_COMPRESSED to 256K".
> > The constant is part of on-disk format for lzo and otherwise changing it
> > would impact performance so this would need proper evaluation.
> 
> Sorry, how is it baked into LZO?  It definitely will have performance 
> implications, I agree there.

lzo_decompress_bio:

309 /*
310  * Compressed data header check.
311  *
312  * The real compressed size can't exceed the maximum extent length, 
and
313  * all pages should be used (whole unused page with just the segment
314  * header is not possible).  If this happens it means the compressed
315  * extent is corrupted.
316  */
317 if (tot_len > min_t(size_t, BTRFS_MAX_COMPRESSED, srclen) ||
318 tot_len < srclen - PAGE_SIZE) {
319 ret = -EUCLEAN;
320 goto done;
321 }


[RFC 1/2] powerpc/cpuidle: Extract IPI based and timer based wakeup latency from idle states

2021-06-11 Thread Pratik R. Sampat
Introduce a mechanism to fire directed IPIs from a specified source CPU
to a specified target CPU and measure the difference in time incurred on
wakeup.

Also, introduce a mechanism to queue a HR timer on a specified CPU and
subsequently measure the time taken to wakeup the CPU.

Finally define a simple debugfs interface to control the knobs to fire
the IPI and Timer events on specified CPU and view their incurred idle
wakeup latencies.

Signed-off-by: Pratik R. Sampat 
---
 arch/powerpc/kernel/Makefile   |   1 +
 arch/powerpc/kernel/test-cpuidle_latency.c | 157 +
 lib/Kconfig.debug  |  10 ++
 3 files changed, 168 insertions(+)
 create mode 100644 arch/powerpc/kernel/test-cpuidle_latency.c

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index f66b63e81c3b..56e36e797dd4 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -56,6 +56,7 @@ obj-$(CONFIG_PPC_WATCHDOG)+= watchdog.o
 obj-$(CONFIG_HAVE_HW_BREAKPOINT)   += hw_breakpoint.o
 obj-$(CONFIG_PPC_DAWR) += dawr.o
 obj-$(CONFIG_PPC_BOOK3S_64)+= cpu_setup_ppc970.o cpu_setup_pa6t.o
+obj-$(CONFIG_IDLE_LATENCY_SELFTEST)  += test-cpuidle_latency.o
 obj-$(CONFIG_PPC_BOOK3S_64)+= cpu_setup_power.o
 obj-$(CONFIG_PPC_BOOK3S_64)+= mce.o mce_power.o
 obj-$(CONFIG_PPC_BOOK3E_64)+= exceptions-64e.o idle_book3e.o
diff --git a/arch/powerpc/kernel/test-cpuidle_latency.c 
b/arch/powerpc/kernel/test-cpuidle_latency.c
new file mode 100644
index ..f138011ac225
--- /dev/null
+++ b/arch/powerpc/kernel/test-cpuidle_latency.c
@@ -0,0 +1,157 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/*
+ * Module-based API test facility for cpuidle latency using IPIs and timers
+ */
+
+#include 
+#include 
+#include 
+
+/*
+ * IPI based wakeup latencies
+ * Measure time taken for a CPU to wakeup on a IPI sent from another CPU
+ * The latency measured also includes the latency of sending the IPI
+ */
+struct latency {
+   unsigned int src_cpu;
+   unsigned int dest_cpu;
+   ktime_t time_start;
+   ktime_t time_end;
+   u64 latency_ns;
+} ipi_wakeup;
+
+static void measure_latency(void *info)
+{
+   struct latency *v;
+   ktime_t time_diff;
+
+   v = (struct latency *)info;
+   v->time_end = ktime_get();
+   time_diff = ktime_sub(v->time_end, v->time_start);
+   v->latency_ns = ktime_to_ns(time_diff);
+}
+
+void run_smp_call_function_test(unsigned int cpu)
+{
+   ipi_wakeup.src_cpu = smp_processor_id();
+   ipi_wakeup.dest_cpu = cpu;
+   ipi_wakeup.time_start = ktime_get();
+   smp_call_function_single(cpu, measure_latency, _wakeup, 1);
+}
+
+/*
+ * Timer based wakeup latencies
+ * Measure time taken for a CPU to wakeup on a timer being armed and fired
+ */
+struct timer_data {
+   unsigned int src_cpu;
+   u64 timeout;
+   ktime_t time_start;
+   ktime_t time_end;
+   struct hrtimer timer;
+   u64 timeout_diff_ns;
+} timer_wakeup;
+
+static enum hrtimer_restart timer_called(struct hrtimer *hrtimer)
+{
+   struct timer_data *w;
+   ktime_t time_diff;
+
+   w = container_of(hrtimer, struct timer_data, timer);
+   w->time_end = ktime_get();
+
+   time_diff = ktime_sub(w->time_end, w->time_start);
+   time_diff = ktime_sub(time_diff, ns_to_ktime(w->timeout));
+   w->timeout_diff_ns = ktime_to_ns(time_diff);
+   return HRTIMER_NORESTART;
+}
+
+static void run_timer_test(unsigned int ns)
+{
+   hrtimer_init(_wakeup.timer, CLOCK_MONOTONIC,
+HRTIMER_MODE_REL);
+   timer_wakeup.timer.function = timer_called;
+   timer_wakeup.src_cpu = smp_processor_id();
+   timer_wakeup.timeout = ns;
+   timer_wakeup.time_start = ktime_get();
+
+   hrtimer_start(_wakeup.timer, ns_to_ktime(ns),
+ HRTIMER_MODE_REL_PINNED);
+}
+
+static struct dentry *dir;
+
+static int cpu_read_op(void *data, u64 *dest_cpu)
+{
+   *dest_cpu = ipi_wakeup.dest_cpu;
+   return 0;
+}
+
+static int cpu_write_op(void *data, u64 value)
+{
+   run_smp_call_function_test(value);
+   return 0;
+}
+DEFINE_SIMPLE_ATTRIBUTE(ipi_ops, cpu_read_op, cpu_write_op, "%llu\n");
+
+static int timeout_read_op(void *data, u64 *timeout)
+{
+   *timeout = timer_wakeup.timeout;
+   return 0;
+}
+
+static int timeout_write_op(void *data, u64 value)
+{
+   run_timer_test(value);
+   return 0;
+}
+DEFINE_SIMPLE_ATTRIBUTE(timeout_ops, timeout_read_op, timeout_write_op, 
"%llu\n");
+
+static int __init latency_init(void)
+{
+   struct dentry *temp;
+
+   dir = debugfs_create_dir("latency_test", 0);
+   if (!dir) {
+   pr_alert("latency_test: failed to create 
/sys/kernel/debug/latency_test\n");
+   return -1;
+   }
+   temp = debugfs_create_file("ipi_cpu_dest",
+  0666,
+  dir,
+

[RFC 2/2] powerpc/selftest: Add support for cpuidle latency measurement

2021-06-11 Thread Pratik R. Sampat
The cpuidle latency selftest provides support to systematically extract,
analyse and present IPI and timer based wakeup latencies for each CPU
and each idle state available on the system.

The selftest leverages test-cpuidle_latency module's debugfs interface
to interact and extract latency information from the kernel.

The selftest inserts the module if already not inserted, disables all
the idle states and enables them one by one testing the following:
1. Keeping source CPU constant, iterate through all the CPUS measuring
  IPI latency for baseline (CPU is busy with cat /dev/random > /dev/null
  workload) and then when the CPU is allowed to be at rest
2. Iterating through all the CPUs, sending expected timer durations to
  be equivalent to the residency of the deepest idle state enabled
  and extracting the difference in time between the time of wakeup and
  the expected timer duration

To run this test specifically:
$ sudo make -C tools/testing/selftests \
  TARGETS="powerpc/cpuidle_latency" run_tests

There are a few optional arguments too that the script can take
[-h ]
[-m ]
[-o ]
[-v  (run on all cpus)]
Default Output location in:
tools/testing/selftests/powerpc/cpuidle_latency/cpuidle_latency.log

To run the test without re-compiling:
$ cd tools/testing/selftest/powerpc/cpuidle_latency/
$ sudo ./cpuidle_latency.sh

Signed-off-by: Pratik R. Sampat 
---
 tools/testing/selftests/powerpc/Makefile  |   1 +
 .../powerpc/cpuidle_latency/.gitignore|   2 +
 .../powerpc/cpuidle_latency/Makefile  |   6 +
 .../cpuidle_latency/cpuidle_latency.sh| 419 ++
 .../powerpc/cpuidle_latency/settings  |   1 +
 5 files changed, 429 insertions(+)
 create mode 100644 tools/testing/selftests/powerpc/cpuidle_latency/.gitignore
 create mode 100644 tools/testing/selftests/powerpc/cpuidle_latency/Makefile
 create mode 100755 
tools/testing/selftests/powerpc/cpuidle_latency/cpuidle_latency.sh
 create mode 100644 tools/testing/selftests/powerpc/cpuidle_latency/settings

diff --git a/tools/testing/selftests/powerpc/Makefile 
b/tools/testing/selftests/powerpc/Makefile
index 0830e63818c1..71ce6fff867d 100644
--- a/tools/testing/selftests/powerpc/Makefile
+++ b/tools/testing/selftests/powerpc/Makefile
@@ -17,6 +17,7 @@ SUB_DIRS = alignment  \
   benchmarks   \
   cache_shape  \
   copyloops\
+  cpuidle_latency  \
   dscr \
   mm   \
   nx-gzip  \
diff --git a/tools/testing/selftests/powerpc/cpuidle_latency/.gitignore 
b/tools/testing/selftests/powerpc/cpuidle_latency/.gitignore
new file mode 100644
index ..987f8852dc59
--- /dev/null
+++ b/tools/testing/selftests/powerpc/cpuidle_latency/.gitignore
@@ -0,0 +1,2 @@
+# SPDX-License-Identifier: GPL-2.0-only
+cpuidle_latency.log
diff --git a/tools/testing/selftests/powerpc/cpuidle_latency/Makefile 
b/tools/testing/selftests/powerpc/cpuidle_latency/Makefile
new file mode 100644
index ..04492b6d2582
--- /dev/null
+++ b/tools/testing/selftests/powerpc/cpuidle_latency/Makefile
@@ -0,0 +1,6 @@
+# SPDX-License-Identifier: GPL-2.0
+all:
+
+TEST_PROGS := cpuidle_latency.sh
+
+include ../../lib.mk
diff --git a/tools/testing/selftests/powerpc/cpuidle_latency/cpuidle_latency.sh 
b/tools/testing/selftests/powerpc/cpuidle_latency/cpuidle_latency.sh
new file mode 100755
index ..6b55167de488
--- /dev/null
+++ b/tools/testing/selftests/powerpc/cpuidle_latency/cpuidle_latency.sh
@@ -0,0 +1,419 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# CPU-Idle latency selftest provides support to systematically extract,
+# analyse and present IPI and timer based wakeup latencies for each CPU
+# and each idle state available on the system by leveraging the
+# test-cpuidle_latency module
+#
+# Author: Pratik R. Sampat 
+
+LOG=cpuidle_latency.log
+MODULE=/lib/modules/$(uname 
-r)/kernel/arch/powerpc/kernel/test-cpuidle_latency.ko
+
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
+VERBOSE=0
+
+DISABLE=1
+ENABLE=0
+
+helpme()
+{
+   printf "Usage: $0 [-h] [-todg args]
+   [-h ]
+   [-m ]
+   [-o ]
+   [-v ]
+   \n"
+   exit 2
+}
+
+parse_arguments()
+{
+   while getopts ht:m:o:vt:it: arg
+   do
+   case $arg in
+   h) # --help
+   helpme
+   ;;
+   m) # --mod-file
+   MODULE=$OPTARG
+   ;;
+   o) # output log files
+   LOG=$OPTARG
+   ;;
+   v) # Verbose mode - all threads of the CPU
+   VERBOSE=1
+   ;;
+   \?)
+   helpme
+   

[RFC 0/2] CPU-Idle latency selftest framework

2021-06-11 Thread Pratik R. Sampat
A kernel module + userspace driver to estimate the wakeup latency
caused by going into stop states. The motivation behind this program is
to find significant deviations behind advertised latency and residency
values.

The patchset measures latencies for two kinds of events. IPIs and Timers
As this is a software-only mechanism, there will additional latencies of
the kernel-firmware-hardware interactions. To account for that, the
program also measures a baseline latency on a 100 percent loaded CPU
and the latencies achieved must be in view relative to that.

To achieve this, we introduce a kernel module and expose its control
knobs through the debugfs interface that the selftests can engage with.

The kernel module provides the following interfaces within
/sys/kernel/debug/latency_test/ for,

IPI test:
ipi_cpu_dest = Destination CPU for the IPI
ipi_cpu_src = Origin of the IPI
ipi_latency_ns = Measured latency time in ns
Timeout test:
timeout_cpu_src = CPU on which the timer to be queued
timeout_expected_ns = Timer duration
timeout_diff_ns = Difference of actual duration vs expected timer


Sample output on a POWER9 system is as follows:
# --IPI Latency Test---
# Baseline Average IPI latency(ns): 3114
# Observed Average IPI latency(ns) - Snooze: 3265
# Observed Average IPI latency(ns) - Stop0_lite: 3507
# Observed Average IPI latency(ns) - Stop0: 3739
# Observed Average IPI latency(ns) - Stop2: 3807
# Observed Average IPI latency(ns) - Stop4: 17070
# Observed Average IPI latency(ns) - Stop5: 1038174
# 
# --Timeout Latency Test--
# Baseline Average timeout diff(ns): 1420
# Observed Average timeout diff(ns) - Snooze: 1640
# Observed Average timeout diff(ns) - Stop0_lite: 1764
# Observed Average timeout diff(ns) - Stop0: 1715
# Observed Average timeout diff(ns) - Stop2: 1845
# Observed Average timeout diff(ns) - Stop4: 16581
# Observed Average timeout diff(ns) - Stop5: 939977

Pratik R. Sampat (2):
  powerpc/cpuidle: Extract IPI based and timer based wakeup latency from
idle states
  powerpc/selftest: Add support for cpuidle latency measurement

 arch/powerpc/kernel/Makefile  |   1 +
 arch/powerpc/kernel/test-cpuidle_latency.c| 157 +++
 lib/Kconfig.debug |  10 +
 tools/testing/selftests/powerpc/Makefile  |   1 +
 .../powerpc/cpuidle_latency/.gitignore|   2 +
 .../powerpc/cpuidle_latency/Makefile  |   6 +
 .../cpuidle_latency/cpuidle_latency.sh| 419 ++
 .../powerpc/cpuidle_latency/settings  |   1 +
 8 files changed, 597 insertions(+)
 create mode 100644 arch/powerpc/kernel/test-cpuidle_latency.c
 create mode 100644 tools/testing/selftests/powerpc/cpuidle_latency/.gitignore
 create mode 100644 tools/testing/selftests/powerpc/cpuidle_latency/Makefile
 create mode 100755 
tools/testing/selftests/powerpc/cpuidle_latency/cpuidle_latency.sh
 create mode 100644 tools/testing/selftests/powerpc/cpuidle_latency/settings

-- 
2.17.1



Re: [PATCH] fs: btrfs: Disable BTRFS on platforms having 256K pages

2021-06-11 Thread David Sterba
On Thu, Jun 10, 2021 at 05:23:02AM +, Christophe Leroy wrote:
> With a config having PAGE_SIZE set to 256K, BTRFS build fails
> with the following message
> 
>  include/linux/compiler_types.h:326:38: error: call to 
> '__compiletime_assert_791' declared with attribute error: BUILD_BUG_ON 
> failed: (BTRFS_MAX_COMPRESSED % PAGE_SIZE) != 0
> 
> BTRFS_MAX_COMPRESSED being 128K, BTRFS cannot support platforms with
> 256K pages at the time being.
> 
> There are two platforms that can select 256K pages:
>  - hexagon
>  - powerpc
> 
> Disable BTRFS when 256K page size is selected.
> 
> Reported-by: kernel test robot 
> Signed-off-by: Christophe Leroy 

With updated changelog added to misc-next, thanks.


Re: [PATCH 6/6] mm/mremap: hold the rmap lock in write mode when moving page table entries.

2021-06-11 Thread Jann Horn
On Thu, Jun 10, 2021 at 10:35 AM Aneesh Kumar K.V
 wrote:
> To avoid a race between rmap walk and mremap, mremap does take_rmap_locks().
> The lock was taken to ensure that rmap walk don't miss a page table entry due 
> to
> PTE moves via move_pagetables(). The kernel does further optimization of
> this lock such that if we are going to find the newly added vma after the
> old vma, the rmap lock is not taken. This is because rmap walk would find the
> vmas in the same order and if we don't find the page table attached to
> older vma we would find it with the new vma which we would iterate later.
[...]
> Fixes: 2c91bd4a4e2e ("mm: speed up mremap by 20x on large regions")
> Fixes: c49dd3401802 ("mm: speedup mremap on 1GB or larger regions")

probably also "Cc: sta...@vger.kernel.org"?


[PATCH] powerpc/boot: add zImage.lds to targets

2021-06-11 Thread Nicholas Piggin
This prevents spurious rebuilds of the lds and then wrappers.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/boot/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index be84a72f8258..405acd2df160 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -229,7 +229,7 @@ $(obj)/wrapper.a: $(obj-wlib) FORCE
 
 hostprogs  := addnote hack-coff mktree
 
-targets+= $(patsubst $(obj)/%,%,$(obj-boot) wrapper.a)
+targets+= $(patsubst $(obj)/%,%,$(obj-boot) wrapper.a) 
zImage.lds
 extra-y:= $(obj)/wrapper.a $(obj-plat) $(obj)/empty.o \
   $(obj)/zImage.lds $(obj)/zImage.coff.lds 
$(obj)/zImage.ps3.lds
 
-- 
2.23.0



[PATCH] powerpc/build: vdso linker warning for orphan sections

2021-06-11 Thread Nicholas Piggin
Add --orphan-handling=warn for vdsos, and adjust vdso linker scripts to
deal with orphan sections.

Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/kernel/vdso32/Makefile | 2 +-
 arch/powerpc/kernel/vdso32/vdso32.lds.S | 3 ++-
 arch/powerpc/kernel/vdso64/Makefile | 2 +-
 arch/powerpc/kernel/vdso64/vdso64.lds.S | 3 ++-
 4 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/vdso32/Makefile 
b/arch/powerpc/kernel/vdso32/Makefile
index 7d9a6fee0e3d..403033013848 100644
--- a/arch/powerpc/kernel/vdso32/Makefile
+++ b/arch/powerpc/kernel/vdso32/Makefile
@@ -66,7 +66,7 @@ include/generated/vdso32-offsets.h: $(obj)/vdso32.so.dbg FORCE
 
 # actual build commands
 quiet_cmd_vdso32ld_and_check = VDSO32L $@
-  cmd_vdso32ld_and_check = $(VDSOCC) $(c_flags) $(CC32FLAGS) -o $@ 
-Wl,-T$(filter %.lds,$^) $(filter %.o,$^) ; $(cmd_vdso_check)
+  cmd_vdso32ld_and_check = $(VDSOCC) $(c_flags) $(CC32FLAGS) -o $@ 
-Wl,-T$(filter %.lds,$^) $(filter %.o,$^) $(if $(CONFIG_LD_ORPHAN_WARN), 
-Wl$(comma)--orphan-handling=warn) ; $(cmd_vdso_check)
 quiet_cmd_vdso32as = VDSO32A $@
   cmd_vdso32as = $(VDSOCC) $(a_flags) $(CC32FLAGS) -c -o $@ $<
 quiet_cmd_vdso32cc = VDSO32C $@
diff --git a/arch/powerpc/kernel/vdso32/vdso32.lds.S 
b/arch/powerpc/kernel/vdso32/vdso32.lds.S
index 58e0099f70f4..b42e8759e67a 100644
--- a/arch/powerpc/kernel/vdso32/vdso32.lds.S
+++ b/arch/powerpc/kernel/vdso32/vdso32.lds.S
@@ -85,9 +85,10 @@ SECTIONS
 
/DISCARD/   : {
*(.note.GNU-stack)
+   *(.branch_lt)
*(.data .data.* .gnu.linkonce.d.* .sdata*)
*(.bss .sbss .dynbss .dynsbss)
-   *(.got1)
+   *(.got1 .glink .iplt .rela*)
}
 }
 
diff --git a/arch/powerpc/kernel/vdso64/Makefile 
b/arch/powerpc/kernel/vdso64/Makefile
index d783c07e558f..980dfd631a08 100644
--- a/arch/powerpc/kernel/vdso64/Makefile
+++ b/arch/powerpc/kernel/vdso64/Makefile
@@ -59,4 +59,4 @@ include/generated/vdso64-offsets.h: $(obj)/vdso64.so.dbg FORCE
 
 # actual build commands
 quiet_cmd_vdso64ld_and_check = VDSO64L $@
-  cmd_vdso64ld_and_check = $(CC) $(c_flags) -o $@ -Wl,-T$(filter %.lds,$^) 
$(filter %.o,$^); $(cmd_vdso_check)
+  cmd_vdso64ld_and_check = $(CC) $(c_flags) -o $@ -Wl,-T$(filter %.lds,$^) 
$(filter %.o,$^) $(if $(CONFIG_LD_ORPHAN_WARN), 
-Wl$(comma)--orphan-handling=warn) ; $(cmd_vdso_check)
diff --git a/arch/powerpc/kernel/vdso64/vdso64.lds.S 
b/arch/powerpc/kernel/vdso64/vdso64.lds.S
index 0288cad428b0..3750b3b15b51 100644
--- a/arch/powerpc/kernel/vdso64/vdso64.lds.S
+++ b/arch/powerpc/kernel/vdso64/vdso64.lds.S
@@ -33,7 +33,7 @@ SECTIONS
. = ALIGN(16);
.text   : {
*(.text .stub .text.* .gnu.linkonce.t.* __ftr_alt_*)
-   *(.sfpr .glink)
+   *(.sfpr)
}   :text
PROVIDE(__etext = .);
PROVIDE(_etext = .);
@@ -87,6 +87,7 @@ SECTIONS
*(.data .data.* .gnu.linkonce.d.* .sdata*)
*(.bss .sbss .dynbss .dynsbss)
*(.opd)
+   *(.glink .iplt .plt .rela*)
}
 }
 
-- 
2.23.0



Re: [PATCH v4 2/2] powerpc/64: Option to use ELF V2 ABI for big-endian kernels

2021-06-11 Thread Michal Suchánek
On Fri, Jun 11, 2021 at 11:58:19AM +0200, Michal Suchánek wrote:
> On Fri, Jun 11, 2021 at 07:39:59PM +1000, Nicholas Piggin wrote:
> > Provide an option to build big-endian kernels using the ELFv2 ABI. This
> > works on GCC only so far, although it is rumored to work with clang
> > that's not been tested yet. A new module version check ensures the
> > module ELF ABI level matches the kernel build.
> > 
> > This can give big-endian kernels some useful advantages of the ELFv2 ABI
> > (e.g., less stack usage, -mprofile-kernel, better compatibility with eBPF
> > tools).
> > 
> > BE+ELFv2 is not officially supported by the GNU toolchain, but it works
> > fine in testing and has been used by some userspace for some time (e.g.,
> > Void Linux).
> > 
> > Tested-by: Michal Suchánek 
> > Reviewed-by: Segher Boessenkool 
> > Signed-off-by: Nicholas Piggin 
> > ---
> >  arch/powerpc/Kconfig| 22 ++
> >  arch/powerpc/Makefile   | 18 --
> >  arch/powerpc/boot/Makefile  |  4 +++-
> >  arch/powerpc/include/asm/module.h   | 24 
> >  arch/powerpc/kernel/vdso64/Makefile | 13 +
> >  drivers/crypto/vmx/Makefile |  8 ++--
> >  drivers/crypto/vmx/ppc-xlate.pl | 10 ++
> >  7 files changed, 86 insertions(+), 13 deletions(-)
> > 
> > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > index 088dd2afcfe4..093f973a28b9 100644
> > --- a/arch/powerpc/Kconfig
> > +++ b/arch/powerpc/Kconfig
> > @@ -163,6 +163,7 @@ config PPC
> > select ARCH_WEAK_RELEASE_ACQUIRE
> > select BINFMT_ELF
> > select BUILDTIME_TABLE_SORT
> > +   select PPC64_BUILD_ELF_V2_ABI   if PPC64 && CPU_LITTLE_ENDIAN
> > select CLONE_BACKWARDS
> > select DCACHE_WORD_ACCESS   if PPC64 && CPU_LITTLE_ENDIAN
> > select DMA_OPS_BYPASS   if PPC64
> > @@ -561,6 +562,27 @@ config KEXEC_FILE
> >  config ARCH_HAS_KEXEC_PURGATORY
> > def_bool KEXEC_FILE
> >  
> > +config PPC64_BUILD_ELF_V2_ABI
> > +   bool
> > +
> > +config PPC64_BUILD_BIG_ENDIAN_ELF_V2_ABI
> > +   bool "Build big-endian kernel using ELF ABI V2 (EXPERIMENTAL)"
> > +   depends on PPC64 && CPU_BIG_ENDIAN && EXPERT
> > +   depends on CC_IS_GCC && LD_VERSION >= 22400
> > +   default n
> > +   select PPC64_BUILD_ELF_V2_ABI
> > +   help
> > + This builds the kernel image using the "Power Architecture 64-Bit ELF
> > + V2 ABI Specification", which has a reduced stack overhead and faster
> > + function calls. This internal kernel ABI option does not affect
> > +  userspace compatibility.
> > +
> > + The V2 ABI is standard for 64-bit little-endian, but for big-endian
> > + it is less well tested by kernel and toolchain. However some distros
> > + build userspace this way, and it can produce a functioning kernel.
> > +
> > + This requires GCC and binutils 2.24 or newer.
> > +
> >  config RELOCATABLE
> > bool "Build a relocatable kernel"
> > depends on PPC64 || (FLATMEM && (44x || FSL_BOOKE))
> > diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> > index 3212d076ac6a..b90b5cb799aa 100644
> > --- a/arch/powerpc/Makefile
> > +++ b/arch/powerpc/Makefile
> > @@ -91,10 +91,14 @@ endif
> >  
> >  ifdef CONFIG_PPC64
> >  ifndef CONFIG_CC_IS_CLANG
> > -cflags-$(CONFIG_CPU_BIG_ENDIAN)+= $(call cc-option,-mabi=elfv1)
> > -cflags-$(CONFIG_CPU_BIG_ENDIAN)+= $(call 
> > cc-option,-mcall-aixdesc)
> > -aflags-$(CONFIG_CPU_BIG_ENDIAN)+= $(call cc-option,-mabi=elfv1)
> > -aflags-$(CONFIG_CPU_LITTLE_ENDIAN) += -mabi=elfv2
> > +ifdef CONFIG_PPC64_BUILD_ELF_V2_ABI
> > +cflags-y   += $(call cc-option,-mabi=elfv2)
> > +aflags-y   += $(call cc-option,-mabi=elfv2)
> > +else
> > +cflags-y   += $(call cc-option,-mabi=elfv1)
> > +cflags-y   += $(call cc-option,-mcall-aixdesc)
> > +aflags-y   += $(call cc-option,-mabi=elfv1)
> > +endif
> >  endif
> >  endif
> >  
> > @@ -142,15 +146,17 @@ endif
> >  
> >  CFLAGS-$(CONFIG_PPC64) := $(call cc-option,-mtraceback=no)
> >  ifndef CONFIG_CC_IS_CLANG
> > -ifdef CONFIG_CPU_LITTLE_ENDIAN
> > -CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mabi=elfv2,$(call 
> > cc-option,-mcall-aixdesc))
> > +ifdef CONFIG_PPC64_BUILD_ELF_V2_ABI
> > +CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mabi=elfv2)
> >  AFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mabi=elfv2)
> >  else
> > +# Keep these in synch with arch/powerpc/kernel/vdso64/Makefile
> >  CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mabi=elfv1)
> >  CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mcall-aixdesc)
> >  AFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mabi=elfv1)
> >  endif
> >  endif
> > +
> >  CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mcmodel=medium,$(call 
> > cc-option,-mminimal-toc))
> >  CFLAGS-$(CONFIG_PPC64) += $(call 
> 

Re: [PATCH v4 2/2] powerpc/64: Option to use ELF V2 ABI for big-endian kernels

2021-06-11 Thread Michal Suchánek
On Fri, Jun 11, 2021 at 07:39:59PM +1000, Nicholas Piggin wrote:
> Provide an option to build big-endian kernels using the ELFv2 ABI. This
> works on GCC only so far, although it is rumored to work with clang
> that's not been tested yet. A new module version check ensures the
> module ELF ABI level matches the kernel build.
> 
> This can give big-endian kernels some useful advantages of the ELFv2 ABI
> (e.g., less stack usage, -mprofile-kernel, better compatibility with eBPF
> tools).
> 
> BE+ELFv2 is not officially supported by the GNU toolchain, but it works
> fine in testing and has been used by some userspace for some time (e.g.,
> Void Linux).
> 
> Tested-by: Michal Suchánek 
> Reviewed-by: Segher Boessenkool 
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/powerpc/Kconfig| 22 ++
>  arch/powerpc/Makefile   | 18 --
>  arch/powerpc/boot/Makefile  |  4 +++-
>  arch/powerpc/include/asm/module.h   | 24 
>  arch/powerpc/kernel/vdso64/Makefile | 13 +
>  drivers/crypto/vmx/Makefile |  8 ++--
>  drivers/crypto/vmx/ppc-xlate.pl | 10 ++
>  7 files changed, 86 insertions(+), 13 deletions(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 088dd2afcfe4..093f973a28b9 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -163,6 +163,7 @@ config PPC
>   select ARCH_WEAK_RELEASE_ACQUIRE
>   select BINFMT_ELF
>   select BUILDTIME_TABLE_SORT
> + select PPC64_BUILD_ELF_V2_ABI   if PPC64 && CPU_LITTLE_ENDIAN
>   select CLONE_BACKWARDS
>   select DCACHE_WORD_ACCESS   if PPC64 && CPU_LITTLE_ENDIAN
>   select DMA_OPS_BYPASS   if PPC64
> @@ -561,6 +562,27 @@ config KEXEC_FILE
>  config ARCH_HAS_KEXEC_PURGATORY
>   def_bool KEXEC_FILE
>  
> +config PPC64_BUILD_ELF_V2_ABI
> + bool
> +
> +config PPC64_BUILD_BIG_ENDIAN_ELF_V2_ABI
> + bool "Build big-endian kernel using ELF ABI V2 (EXPERIMENTAL)"
> + depends on PPC64 && CPU_BIG_ENDIAN && EXPERT
> + depends on CC_IS_GCC && LD_VERSION >= 22400
> + default n
> + select PPC64_BUILD_ELF_V2_ABI
> + help
> +   This builds the kernel image using the "Power Architecture 64-Bit ELF
> +   V2 ABI Specification", which has a reduced stack overhead and faster
> +   function calls. This internal kernel ABI option does not affect
> +  userspace compatibility.
> +
> +   The V2 ABI is standard for 64-bit little-endian, but for big-endian
> +   it is less well tested by kernel and toolchain. However some distros
> +   build userspace this way, and it can produce a functioning kernel.
> +
> +   This requires GCC and binutils 2.24 or newer.
> +
>  config RELOCATABLE
>   bool "Build a relocatable kernel"
>   depends on PPC64 || (FLATMEM && (44x || FSL_BOOKE))
> diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
> index 3212d076ac6a..b90b5cb799aa 100644
> --- a/arch/powerpc/Makefile
> +++ b/arch/powerpc/Makefile
> @@ -91,10 +91,14 @@ endif
>  
>  ifdef CONFIG_PPC64
>  ifndef CONFIG_CC_IS_CLANG
> -cflags-$(CONFIG_CPU_BIG_ENDIAN)  += $(call cc-option,-mabi=elfv1)
> -cflags-$(CONFIG_CPU_BIG_ENDIAN)  += $(call 
> cc-option,-mcall-aixdesc)
> -aflags-$(CONFIG_CPU_BIG_ENDIAN)  += $(call cc-option,-mabi=elfv1)
> -aflags-$(CONFIG_CPU_LITTLE_ENDIAN)   += -mabi=elfv2
> +ifdef CONFIG_PPC64_BUILD_ELF_V2_ABI
> +cflags-y += $(call cc-option,-mabi=elfv2)
> +aflags-y += $(call cc-option,-mabi=elfv2)
> +else
> +cflags-y += $(call cc-option,-mabi=elfv1)
> +cflags-y += $(call cc-option,-mcall-aixdesc)
> +aflags-y += $(call cc-option,-mabi=elfv1)
> +endif
>  endif
>  endif
>  
> @@ -142,15 +146,17 @@ endif
>  
>  CFLAGS-$(CONFIG_PPC64)   := $(call cc-option,-mtraceback=no)
>  ifndef CONFIG_CC_IS_CLANG
> -ifdef CONFIG_CPU_LITTLE_ENDIAN
> -CFLAGS-$(CONFIG_PPC64)   += $(call cc-option,-mabi=elfv2,$(call 
> cc-option,-mcall-aixdesc))
> +ifdef CONFIG_PPC64_BUILD_ELF_V2_ABI
> +CFLAGS-$(CONFIG_PPC64)   += $(call cc-option,-mabi=elfv2)
>  AFLAGS-$(CONFIG_PPC64)   += $(call cc-option,-mabi=elfv2)
>  else
> +# Keep these in synch with arch/powerpc/kernel/vdso64/Makefile
>  CFLAGS-$(CONFIG_PPC64)   += $(call cc-option,-mabi=elfv1)
>  CFLAGS-$(CONFIG_PPC64)   += $(call cc-option,-mcall-aixdesc)
>  AFLAGS-$(CONFIG_PPC64)   += $(call cc-option,-mabi=elfv1)
>  endif
>  endif
> +
>  CFLAGS-$(CONFIG_PPC64)   += $(call cc-option,-mcmodel=medium,$(call 
> cc-option,-mminimal-toc))
>  CFLAGS-$(CONFIG_PPC64)   += $(call 
> cc-option,-mno-pointers-to-nested-functions)
>  
> diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
> index 2b8da923ceca..be84a72f8258 100644
> --- a/arch/powerpc/boot/Makefile
> +++ 

[PATCH -next 5/9] ASoC: fsl_micfil: Use devm_platform_get_and_ioremap_resource()

2021-06-11 Thread Yang Yingliang
Use devm_platform_get_and_ioremap_resource() to simplify
code.

Signed-off-by: Yang Yingliang 
---
 sound/soc/fsl/fsl_micfil.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/sound/soc/fsl/fsl_micfil.c b/sound/soc/fsl/fsl_micfil.c
index 3cf789ed6cbe..8c0c75ce9490 100644
--- a/sound/soc/fsl/fsl_micfil.c
+++ b/sound/soc/fsl/fsl_micfil.c
@@ -669,8 +669,7 @@ static int fsl_micfil_probe(struct platform_device *pdev)
}
 
/* init regmap */
-   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-   regs = devm_ioremap_resource(>dev, res);
+   regs = devm_platform_get_and_ioremap_resource(pdev, 0, );
if (IS_ERR(regs))
return PTR_ERR(regs);
 
-- 
2.25.1



[PATCH -next 8/9] ASoC: fsl_ssi: Use devm_platform_get_and_ioremap_resource()

2021-06-11 Thread Yang Yingliang
Use devm_platform_get_and_ioremap_resource() to simplify
code.

Signed-off-by: Yang Yingliang 
---
 sound/soc/fsl/fsl_ssi.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/sound/soc/fsl/fsl_ssi.c b/sound/soc/fsl/fsl_ssi.c
index 2b57b60431bb..ecbc1c365d5b 100644
--- a/sound/soc/fsl/fsl_ssi.c
+++ b/sound/soc/fsl/fsl_ssi.c
@@ -1503,8 +1503,7 @@ static int fsl_ssi_probe(struct platform_device *pdev)
}
ssi->cpu_dai_drv.name = dev_name(dev);
 
-   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-   iomem = devm_ioremap_resource(dev, res);
+   iomem = devm_platform_get_and_ioremap_resource(pdev, 0, );
if (IS_ERR(iomem))
return PTR_ERR(iomem);
ssi->ssi_phys = res->start;
-- 
2.25.1



[PATCH -next 9/9] ASoC: fsl_xcvr: check return value after calling platform_get_resource_byname()

2021-06-11 Thread Yang Yingliang
It will cause null-ptr-deref if platform_get_resource_byname() returns NULL,
we need check the return value.

Signed-off-by: Yang Yingliang 
---
 sound/soc/fsl/fsl_xcvr.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/sound/soc/fsl/fsl_xcvr.c b/sound/soc/fsl/fsl_xcvr.c
index df7c189d97dd..2e9061c5ed74 100644
--- a/sound/soc/fsl/fsl_xcvr.c
+++ b/sound/soc/fsl/fsl_xcvr.c
@@ -1202,6 +1202,10 @@ static int fsl_xcvr_probe(struct platform_device *pdev)
 
rx_res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "rxfifo");
tx_res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "txfifo");
+   if (!rx_res || !tx_res) {
+   dev_err(dev, "Invalid resource\n");
+   return -EINVAL;
+   }
xcvr->dma_prms_rx.chan_name = "rx";
xcvr->dma_prms_tx.chan_name = "tx";
xcvr->dma_prms_rx.addr = rx_res->start;
-- 
2.25.1



[PATCH -next 1/9] ASoC: fsl_asrc: Use devm_platform_get_and_ioremap_resource()

2021-06-11 Thread Yang Yingliang
Use devm_platform_get_and_ioremap_resource() to simplify
code.

Signed-off-by: Yang Yingliang 
---
 sound/soc/fsl/fsl_asrc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/sound/soc/fsl/fsl_asrc.c b/sound/soc/fsl/fsl_asrc.c
index 0e1ad8efebd3..24b41881a68f 100644
--- a/sound/soc/fsl/fsl_asrc.c
+++ b/sound/soc/fsl/fsl_asrc.c
@@ -1035,8 +1035,7 @@ static int fsl_asrc_probe(struct platform_device *pdev)
asrc->private = asrc_priv;
 
/* Get the addresses and IRQ */
-   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-   regs = devm_ioremap_resource(>dev, res);
+   regs = devm_platform_get_and_ioremap_resource(pdev, 0, );
if (IS_ERR(regs))
return PTR_ERR(regs);
 
-- 
2.25.1



[PATCH -next 2/9] ASoC: fsl_aud2htx: Use devm_platform_get_and_ioremap_resource()

2021-06-11 Thread Yang Yingliang
Use devm_platform_get_and_ioremap_resource() to simplify
code.

Signed-off-by: Yang Yingliang 
---
 sound/soc/fsl/fsl_aud2htx.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/sound/soc/fsl/fsl_aud2htx.c b/sound/soc/fsl/fsl_aud2htx.c
index a328697511f7..99ab7f0241cf 100644
--- a/sound/soc/fsl/fsl_aud2htx.c
+++ b/sound/soc/fsl/fsl_aud2htx.c
@@ -196,8 +196,7 @@ static int fsl_aud2htx_probe(struct platform_device *pdev)
 
aud2htx->pdev = pdev;
 
-   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-   regs = devm_ioremap_resource(>dev, res);
+   regs = devm_platform_get_and_ioremap_resource(pdev, 0, );
if (IS_ERR(regs))
return PTR_ERR(regs);
 
-- 
2.25.1



[PATCH -next 6/9] ASoC: fsl_sai: Use devm_platform_get_and_ioremap_resource()

2021-06-11 Thread Yang Yingliang
Use devm_platform_get_and_ioremap_resource() to simplify
code.

Signed-off-by: Yang Yingliang 
---
 sound/soc/fsl/fsl_sai.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/sound/soc/fsl/fsl_sai.c b/sound/soc/fsl/fsl_sai.c
index 407a45e48eee..223fcd15bfcc 100644
--- a/sound/soc/fsl/fsl_sai.c
+++ b/sound/soc/fsl/fsl_sai.c
@@ -1017,8 +1017,7 @@ static int fsl_sai_probe(struct platform_device *pdev)
 
sai->is_lsb_first = of_property_read_bool(np, "lsb-first");
 
-   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-   base = devm_ioremap_resource(>dev, res);
+   base = devm_platform_get_and_ioremap_resource(pdev, 0, );
if (IS_ERR(base))
return PTR_ERR(base);
 
-- 
2.25.1



[PATCH -next 0/9] ASoC: fsl: Use devm_platform_get_and_ioremap_resource()

2021-06-11 Thread Yang Yingliang
patch #1 ~ #8:
  Use devm_platform_get_and_ioremap_resource()

patch #9
  check return value of platform_get_resource_byname()

Yang Yingliang (9):
  ASoC: fsl_asrc: Use devm_platform_get_and_ioremap_resource()
  ASoC: fsl_aud2htx: Use devm_platform_get_and_ioremap_resource()
  ASoC: fsl_easrc: Use devm_platform_get_and_ioremap_resource()
  ASoC: fsl_esai: Use devm_platform_get_and_ioremap_resource()
  ASoC: fsl_micfil: Use devm_platform_get_and_ioremap_resource()
  ASoC: fsl_sai: Use devm_platform_get_and_ioremap_resource()
  ASoC: fsl_spdif: Use devm_platform_get_and_ioremap_resource()
  ASoC: fsl_ssi: Use devm_platform_get_and_ioremap_resource()
  ASoC: fsl_xcvr: check return value after calling
platform_get_resource_byname()

 sound/soc/fsl/fsl_asrc.c| 3 +--
 sound/soc/fsl/fsl_aud2htx.c | 3 +--
 sound/soc/fsl/fsl_easrc.c   | 3 +--
 sound/soc/fsl/fsl_esai.c| 3 +--
 sound/soc/fsl/fsl_micfil.c  | 3 +--
 sound/soc/fsl/fsl_sai.c | 3 +--
 sound/soc/fsl/fsl_spdif.c   | 3 +--
 sound/soc/fsl/fsl_ssi.c | 3 +--
 sound/soc/fsl/fsl_xcvr.c| 4 
 9 files changed, 12 insertions(+), 16 deletions(-)

-- 
2.25.1



[PATCH -next 3/9] ASoC: fsl_easrc: Use devm_platform_get_and_ioremap_resource()

2021-06-11 Thread Yang Yingliang
Use devm_platform_get_and_ioremap_resource() to simplify
code.

Signed-off-by: Yang Yingliang 
---
 sound/soc/fsl/fsl_easrc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/sound/soc/fsl/fsl_easrc.c b/sound/soc/fsl/fsl_easrc.c
index b1765c7d3bcd..19c3c3b5939e 100644
--- a/sound/soc/fsl/fsl_easrc.c
+++ b/sound/soc/fsl/fsl_easrc.c
@@ -1887,8 +1887,7 @@ static int fsl_easrc_probe(struct platform_device *pdev)
easrc->private = easrc_priv;
np = dev->of_node;
 
-   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-   regs = devm_ioremap_resource(dev, res);
+   regs = devm_platform_get_and_ioremap_resource(pdev, 0, );
if (IS_ERR(regs))
return PTR_ERR(regs);
 
-- 
2.25.1



[PATCH -next 4/9] ASoC: fsl_esai: Use devm_platform_get_and_ioremap_resource()

2021-06-11 Thread Yang Yingliang
Use devm_platform_get_and_ioremap_resource() to simplify
code.

Signed-off-by: Yang Yingliang 
---
 sound/soc/fsl/fsl_esai.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/sound/soc/fsl/fsl_esai.c b/sound/soc/fsl/fsl_esai.c
index f356ae5925af..a961f837cd09 100644
--- a/sound/soc/fsl/fsl_esai.c
+++ b/sound/soc/fsl/fsl_esai.c
@@ -969,8 +969,7 @@ static int fsl_esai_probe(struct platform_device *pdev)
esai_priv->soc = of_device_get_match_data(>dev);
 
/* Get the addresses and IRQ */
-   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-   regs = devm_ioremap_resource(>dev, res);
+   regs = devm_platform_get_and_ioremap_resource(pdev, 0, );
if (IS_ERR(regs))
return PTR_ERR(regs);
 
-- 
2.25.1



[PATCH -next 7/9] ASoC: fsl_spdif: Use devm_platform_get_and_ioremap_resource()

2021-06-11 Thread Yang Yingliang
Use devm_platform_get_and_ioremap_resource() to simplify
code.

Signed-off-by: Yang Yingliang 
---
 sound/soc/fsl/fsl_spdif.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/sound/soc/fsl/fsl_spdif.c b/sound/soc/fsl/fsl_spdif.c
index 2a76714eb8e6..d812a3ff5845 100644
--- a/sound/soc/fsl/fsl_spdif.c
+++ b/sound/soc/fsl/fsl_spdif.c
@@ -1355,8 +1355,7 @@ static int fsl_spdif_probe(struct platform_device *pdev)
spdif_priv->soc->tx_formats;
 
/* Get the addresses and IRQ */
-   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-   regs = devm_ioremap_resource(>dev, res);
+   regs = devm_platform_get_and_ioremap_resource(pdev, 0, );
if (IS_ERR(regs))
return PTR_ERR(regs);
 
-- 
2.25.1



[PATCH v4 2/2] powerpc/64: Option to use ELF V2 ABI for big-endian kernels

2021-06-11 Thread Nicholas Piggin
Provide an option to build big-endian kernels using the ELFv2 ABI. This
works on GCC only so far, although it is rumored to work with clang
that's not been tested yet. A new module version check ensures the
module ELF ABI level matches the kernel build.

This can give big-endian kernels some useful advantages of the ELFv2 ABI
(e.g., less stack usage, -mprofile-kernel, better compatibility with eBPF
tools).

BE+ELFv2 is not officially supported by the GNU toolchain, but it works
fine in testing and has been used by some userspace for some time (e.g.,
Void Linux).

Tested-by: Michal Suchánek 
Reviewed-by: Segher Boessenkool 
Signed-off-by: Nicholas Piggin 
---
 arch/powerpc/Kconfig| 22 ++
 arch/powerpc/Makefile   | 18 --
 arch/powerpc/boot/Makefile  |  4 +++-
 arch/powerpc/include/asm/module.h   | 24 
 arch/powerpc/kernel/vdso64/Makefile | 13 +
 drivers/crypto/vmx/Makefile |  8 ++--
 drivers/crypto/vmx/ppc-xlate.pl | 10 ++
 7 files changed, 86 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 088dd2afcfe4..093f973a28b9 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -163,6 +163,7 @@ config PPC
select ARCH_WEAK_RELEASE_ACQUIRE
select BINFMT_ELF
select BUILDTIME_TABLE_SORT
+   select PPC64_BUILD_ELF_V2_ABI   if PPC64 && CPU_LITTLE_ENDIAN
select CLONE_BACKWARDS
select DCACHE_WORD_ACCESS   if PPC64 && CPU_LITTLE_ENDIAN
select DMA_OPS_BYPASS   if PPC64
@@ -561,6 +562,27 @@ config KEXEC_FILE
 config ARCH_HAS_KEXEC_PURGATORY
def_bool KEXEC_FILE
 
+config PPC64_BUILD_ELF_V2_ABI
+   bool
+
+config PPC64_BUILD_BIG_ENDIAN_ELF_V2_ABI
+   bool "Build big-endian kernel using ELF ABI V2 (EXPERIMENTAL)"
+   depends on PPC64 && CPU_BIG_ENDIAN && EXPERT
+   depends on CC_IS_GCC && LD_VERSION >= 22400
+   default n
+   select PPC64_BUILD_ELF_V2_ABI
+   help
+ This builds the kernel image using the "Power Architecture 64-Bit ELF
+ V2 ABI Specification", which has a reduced stack overhead and faster
+ function calls. This internal kernel ABI option does not affect
+  userspace compatibility.
+
+ The V2 ABI is standard for 64-bit little-endian, but for big-endian
+ it is less well tested by kernel and toolchain. However some distros
+ build userspace this way, and it can produce a functioning kernel.
+
+ This requires GCC and binutils 2.24 or newer.
+
 config RELOCATABLE
bool "Build a relocatable kernel"
depends on PPC64 || (FLATMEM && (44x || FSL_BOOKE))
diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 3212d076ac6a..b90b5cb799aa 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -91,10 +91,14 @@ endif
 
 ifdef CONFIG_PPC64
 ifndef CONFIG_CC_IS_CLANG
-cflags-$(CONFIG_CPU_BIG_ENDIAN)+= $(call cc-option,-mabi=elfv1)
-cflags-$(CONFIG_CPU_BIG_ENDIAN)+= $(call 
cc-option,-mcall-aixdesc)
-aflags-$(CONFIG_CPU_BIG_ENDIAN)+= $(call cc-option,-mabi=elfv1)
-aflags-$(CONFIG_CPU_LITTLE_ENDIAN) += -mabi=elfv2
+ifdef CONFIG_PPC64_BUILD_ELF_V2_ABI
+cflags-y   += $(call cc-option,-mabi=elfv2)
+aflags-y   += $(call cc-option,-mabi=elfv2)
+else
+cflags-y   += $(call cc-option,-mabi=elfv1)
+cflags-y   += $(call cc-option,-mcall-aixdesc)
+aflags-y   += $(call cc-option,-mabi=elfv1)
+endif
 endif
 endif
 
@@ -142,15 +146,17 @@ endif
 
 CFLAGS-$(CONFIG_PPC64) := $(call cc-option,-mtraceback=no)
 ifndef CONFIG_CC_IS_CLANG
-ifdef CONFIG_CPU_LITTLE_ENDIAN
-CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mabi=elfv2,$(call 
cc-option,-mcall-aixdesc))
+ifdef CONFIG_PPC64_BUILD_ELF_V2_ABI
+CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mabi=elfv2)
 AFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mabi=elfv2)
 else
+# Keep these in synch with arch/powerpc/kernel/vdso64/Makefile
 CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mabi=elfv1)
 CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mcall-aixdesc)
 AFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mabi=elfv1)
 endif
 endif
+
 CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mcmodel=medium,$(call 
cc-option,-mminimal-toc))
 CFLAGS-$(CONFIG_PPC64) += $(call cc-option,-mno-pointers-to-nested-functions)
 
diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index 2b8da923ceca..be84a72f8258 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -40,6 +40,9 @@ BOOTCFLAGS:= -Wall -Wundef -Wstrict-prototypes 
-Wno-trigraphs \
 
 ifdef CONFIG_PPC64_BOOT_WRAPPER
 BOOTCFLAGS += -m64
+ifdef CONFIG_PPC64_BUILD_ELF_V2_ABI
+BOOTCFLAGS += $(call cc-option,-mabi=elfv2)
+endif
 else
 BOOTCFLAGS 

[PATCH v4 1/2] module: add elf_check_module_arch for module specific elf arch checks

2021-06-11 Thread Nicholas Piggin
The elf_check_arch() function is used to test usermode binaries, but
kernel modules may have more specific requirements. powerpc would like
to test for ABI version compatibility.

Add an arch-overridable function elf_check_module_arch() that defaults
to elf_check_arch() and use it in elf_validity_check().

Signed-off-by: Michael Ellerman 
[np: split patch, added changelog]
Signed-off-by: Nicholas Piggin 
---
 include/linux/moduleloader.h | 5 +
 kernel/module.c  | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
index 9e09d11ffe5b..fdc042a84562 100644
--- a/include/linux/moduleloader.h
+++ b/include/linux/moduleloader.h
@@ -13,6 +13,11 @@
  * must be implemented by each architecture.
  */
 
+// Allow arch to optionally do additional checking of module ELF header
+#ifndef elf_check_module_arch
+#define elf_check_module_arch elf_check_arch
+#endif
+
 /* Adjust arch-specific sections.  Return 0 on success.  */
 int module_frob_arch_sections(Elf_Ehdr *hdr,
  Elf_Shdr *sechdrs,
diff --git a/kernel/module.c b/kernel/module.c
index 7e78dfabca97..7c3f9b7478dc 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -2946,7 +2946,7 @@ static int elf_validity_check(struct load_info *info)
 
if (memcmp(info->hdr->e_ident, ELFMAG, SELFMAG) != 0
|| info->hdr->e_type != ET_REL
-   || !elf_check_arch(info->hdr)
+   || !elf_check_module_arch(info->hdr)
|| info->hdr->e_shentsize != sizeof(Elf_Shdr))
return -ENOEXEC;
 
-- 
2.23.0



[PATCH v4 0/2] powerpc/64: Option to use ELF V2 ABI for big-endian

2021-06-11 Thread Nicholas Piggin
Since v3 I added Michael's module check for ELF ABI level. This requires
a change to core module code. If Jessica is happy with it it could go
via the powerpc tree.

Thanks,
Nick

Nicholas Piggin (2):
  module: add elf_check_module_arch for module specific elf arch checks
  powerpc/64: Option to use ELF V2 ABI for big-endian kernels

 arch/powerpc/Kconfig| 22 ++
 arch/powerpc/Makefile   | 18 --
 arch/powerpc/boot/Makefile  |  4 +++-
 arch/powerpc/include/asm/module.h   | 24 
 arch/powerpc/kernel/vdso64/Makefile | 13 +
 drivers/crypto/vmx/Makefile |  8 ++--
 drivers/crypto/vmx/ppc-xlate.pl | 10 ++
 include/linux/moduleloader.h|  5 +
 kernel/module.c |  2 +-
 9 files changed, 92 insertions(+), 14 deletions(-)

-- 
2.23.0



[PATCH v2] powerpc/tau: Remove superfluous parameter in alloc_workqueue() call

2021-06-11 Thread Finn Thain
This avoids an (optional) compiler warning:

arch/powerpc/kernel/tau_6xx.c: In function 'TAU_init':
arch/powerpc/kernel/tau_6xx.c:204:30: error: too many arguments for format 
[-Werror=format-extra-args]
  tau_workq = alloc_workqueue("tau", WQ_UNBOUND, 1, 0);

Reported-by: Naresh Kamboju 
Fixes: b1c6a0a10bfa ("powerpc/tau: Convert from timer to workqueue")
Signed-off-by: Finn Thain 
---
Changed since v1:
 - Improved commit log message.
---
 arch/powerpc/kernel/tau_6xx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/tau_6xx.c b/arch/powerpc/kernel/tau_6xx.c
index 6c31af7f4fa8..b9a047d92ec0 100644
--- a/arch/powerpc/kernel/tau_6xx.c
+++ b/arch/powerpc/kernel/tau_6xx.c
@@ -201,7 +201,7 @@ static int __init TAU_init(void)
tau_int_enable = IS_ENABLED(CONFIG_TAU_INT) &&
 !strcmp(cur_cpu_spec->platform, "ppc750");
 
-   tau_workq = alloc_workqueue("tau", WQ_UNBOUND, 1, 0);
+   tau_workq = alloc_workqueue("tau", WQ_UNBOUND, 1);
if (!tau_workq)
return -ENOMEM;
 
-- 
2.26.3



Re: [PATCH 2/2] powerpc/tm: Avoid SPR flush if TM is disabled

2021-06-11 Thread Christophe Leroy




Le 01/10/2018 à 21:47, Breno Leitao a écrit :

There is a bug in the flush_tmregs_to_thread() function, where it forces
TM SPRs to be saved to the thread even if the TM facility is disabled.

This bug could be reproduced using a simple test case:

   mtspr(SPRN_TEXASR, XX);
   sleep until load_tm == 0
   cause a coredump
   read SPRN_TEXASR in the coredump

In this case, the coredump may contain an invalid SPR, because the
current code is flushing live SPRs (Used by the last thread with TM
active) into the current thread, overwriting the latest SPRs (which were
valid).

This patch checks if TM is enabled for current task before
saving the SPRs, otherwise, the TM is lazily disabled and the thread
value is already up-to-date and could be used directly, and saving is
not required.


If this patch is still applicable, it has to be rebased.




Fixes: cd63f3cf1d5 ("powerpc/tm: Fix saving of TM SPRs in core dump")
Signed-off-by: Breno Leitao 
---
  arch/powerpc/kernel/ptrace.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index 9667666eb18e..e0a2ee865032 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -138,7 +138,12 @@ static void flush_tmregs_to_thread(struct task_struct *tsk)
  
  	if (MSR_TM_SUSPENDED(mfmsr())) {

tm_reclaim_current(TM_CAUSE_SIGNAL);
-   } else {
+   } else if (tm_enabled(tsk)) {
+   /*
+* Only flush TM SPRs to the thread if TM was enabled,
+* otherwise (TM lazily disabled), the thread already
+* contains the latest SPR value
+*/
tm_enable();
tm_save_sprs(&(tsk->thread));
}



Re: [PATCH 1/2] powerpc/tm: Move tm_enable definition

2021-06-11 Thread Christophe Leroy




Le 01/10/2018 à 21:47, Breno Leitao a écrit :

The goal of this patch is to move function tm_enabled() to tm.h in order
to allow this function to be used by other files as an inline function.

This patch also removes the double inclusion of tm.h in the traps.c
source code. One inclusion is inside a CONFIG_PPC64 ifdef block, and
another one is in the generic part. This double inclusion causes a
redefinition of tm_enable(), that is why it is being fixed here.

There is generic code (non CONFIG_PPC64, thus, non
CONFIG_PPC_TRANSACTIONAL_MEM also) using some TM definitions, which
explains why tm.h is being imported in the generic code. This is
not correct, and this code is now surrounded by a
CONFIG_PPC_TRANSACTIONAL_MEM ifdef block.


You should leave the generic inclusion and remove the second one.

Don't use #ifdef blocks when not necessary, see 
https://www.kernel.org/doc/html/latest/process/coding-style.html#conditional-compilation




These ifdef inclusion will avoid calling tm_abort_check() completely,
but it is not a problem since this function is just returning 'false' if
CONFIG_PPC_TRANSACTIONAL_MEM is not defined.


As tm_abort_check() always returns false, gcc will see it and will optimise-out the check by itself, 
no worry.





Signed-off-by: Breno Leitao 
---
  arch/powerpc/include/asm/tm.h | 5 +
  arch/powerpc/kernel/process.c | 5 -
  arch/powerpc/kernel/traps.c   | 5 -
  3 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/tm.h b/arch/powerpc/include/asm/tm.h
index e94f6db5e367..646d45a2aaae 100644
--- a/arch/powerpc/include/asm/tm.h
+++ b/arch/powerpc/include/asm/tm.h
@@ -19,4 +19,9 @@ extern void tm_restore_sprs(struct thread_struct *thread);
  
  extern bool tm_suspend_disabled;
  
+static inline bool tm_enabled(struct task_struct *tsk)

+{
+   return tsk && tsk->thread.regs && (tsk->thread.regs->msr & MSR_TM);
+}
+
  #endif /* __ASSEMBLY__ */
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 913c5725cdb2..c1ca2451fa3b 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -862,11 +862,6 @@ static inline bool hw_brk_match(struct arch_hw_breakpoint 
*a,
  
  #ifdef CONFIG_PPC_TRANSACTIONAL_MEM
  
-static inline bool tm_enabled(struct task_struct *tsk)

-{
-   return tsk && tsk->thread.regs && (tsk->thread.regs->msr & MSR_TM);
-}
-
  static void tm_reclaim_thread(struct thread_struct *thr, uint8_t cause)
  {
/*
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index c85adb858271..a3d6298b8074 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -64,7 +64,6 @@
  #include 
  #include 
  #include 
-#include 
  #include 
  #include 
  #include 
@@ -1276,9 +1275,11 @@ static int emulate_instruction(struct pt_regs *regs)
  
  	/* Emulate load/store string insn. */

if ((instword & PPC_INST_STRING_GEN_MASK) == PPC_INST_STRING) {
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM


This ifdef is not needed, tm_abort_check() returns false when CONFIG_PPC_TRANSACTIONAL_MEM is not 
defined.



if (tm_abort_check(regs,
   TM_CAUSE_EMULATE | TM_CAUSE_PERSISTENT))
return -EINVAL;
+#endif
PPC_WARN_EMULATED(string, regs);
return emulate_string_inst(regs, instword);
}
@@ -1508,8 +1509,10 @@ void alignment_exception(struct pt_regs *regs)
if (!arch_irq_disabled_regs(regs))
local_irq_enable();
  
+#ifdef CONFIG_PPC_TRANSACTIONAL_MEM


Same here.


if (tm_abort_check(regs, TM_CAUSE_ALIGNMENT | TM_CAUSE_PERSISTENT))
goto bail;
+#endif
  
  	/* we don't implement logging of alignment exceptions */

if (!(current->thread.align_ctl & PR_UNALIGN_SIGBUS))



Re: [PATCH 09/16] ps3disk: use memcpy_{from,to}_bvec

2021-06-11 Thread Christoph Hellwig
On Tue, Jun 08, 2021 at 06:48:22PM -0700, Ira Weiny wrote:
> I'm still not 100% sure that these flushes are needed but the are not no-ops 
> on
> every arch.  Would it be best to preserve them after the 
> memcpy_to/from_bvec()?
> 
> Same thing in patch 11 and 14.

To me it seems kunmap_local should basically always call the equivalent
of flush_kernel_dcache_page.  parisc does this through
kunmap_flush_on_unmap, but none of the other architectures with VIVT
caches or other coherency issues does.

Does anyone have a history or other insights here?


Re: [PATCH] powerpc/tau: Remove redundant parameter in alloc_workqueue() call

2021-06-11 Thread Christophe Leroy

Redundant with what ?

Do you mean superfluous ?

Le 11/06/2021 à 04:59, Finn Thain a écrit :

This avoids an (optional) compiler warning:

arch/powerpc/kernel/tau_6xx.c: In function 'TAU_init':
arch/powerpc/kernel/tau_6xx.c:204:30: error: too many arguments for format 
[-Werror=format-extra-args]
   tau_workq = alloc_workqueue("tau", WQ_UNBOUND, 1, 0);

Reported-by: Naresh Kamboju 
Fixes: b1c6a0a10bfa ("powerpc/tau: Convert from timer to workqueue")
Signed-off-by: Finn Thain 
---
  arch/powerpc/kernel/tau_6xx.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/tau_6xx.c b/arch/powerpc/kernel/tau_6xx.c
index 6c31af7f4fa8..b9a047d92ec0 100644
--- a/arch/powerpc/kernel/tau_6xx.c
+++ b/arch/powerpc/kernel/tau_6xx.c
@@ -201,7 +201,7 @@ static int __init TAU_init(void)
tau_int_enable = IS_ENABLED(CONFIG_TAU_INT) &&
 !strcmp(cur_cpu_spec->platform, "ppc750");
  
-	tau_workq = alloc_workqueue("tau", WQ_UNBOUND, 1, 0);

+   tau_workq = alloc_workqueue("tau", WQ_UNBOUND, 1);
if (!tau_workq)
return -ENOMEM;
  



Re: [PATCH 0/1] PPC32: fix ptrace() access to FPU registers

2021-06-11 Thread Christophe Leroy




Le 19/06/2019 à 14:57, Radu Rendec a écrit :

On Wed, 2019-06-19 at 10:36 +1000, Daniel Axtens wrote:

Andreas Schwab <
sch...@linux-m68k.org

writes:



On Jun 18 2019, Radu Rendec <
radu.ren...@gmail.com

wrote:



Since you already have a working setup, it would be nice if you could
add a printk to arch_ptrace() to print the address and confirm what I
believe happens (by reading the gdb source code).


A ppc32 ptrace syscall goes through compat_arch_ptrace.


Right. I completely overlooked that part.


Ah right, and that (in ptrace32.c) contains code that will work:


/*
 * the user space code considers the floating point
 * to be an array of unsigned int (32 bits) - the
 * index passed in is based on this assumption.
 */
tmp = ((unsigned int *)child->thread.fp_state.fpr)
[FPRINDEX(index)];

FPRINDEX is defined above to deal with the various manipulations you
need to do.


Correct. Basically it does the same that I did in my patch: it divides
the index again by 2 (it's already divided by 4 in compat_arch_ptrace()
so it ends up divided by 8), then takes the least significant bit and
adds it to the index. I take bit 2 of the original address, which is the
same thing (because in FPRHALF() the address is already divided by 4).

So we have this in ptrace32.c:

#define FPRNUMBER(i) (((i) - PT_FPR0) >> 1)
#define FPRHALF(i) (((i) - PT_FPR0) & 1)
#define FPRINDEX(i) TS_FPRWIDTH * FPRNUMBER(i) * 2 + FPRHALF(i)

index = (unsigned long) addr >> 2;
(unsigned int *)child->thread.fp_state.fpr)[FPRINDEX(index)]


And we have this in my patch:

fpidx = (addr - PT_FPR0 * sizeof(long)) / 8;
(void *)>thread.TS_FPR(fpidx) + (addr & 4)


Radu: I think we want to copy that working code back into ptrace.c.


I'm not sure that would work. There's a subtle difference: the code in
ptrace32.c is always compiled on a 64-bit kernel and the user space
calling it is always 32-bit; on the other hand, the code in ptrace.c can
be compiled on either a 64-bit kernel or a 32-bit kernel and the user
space calling it always has the same "bitness" as the kernel.

One difference is the size of the CPU registers. On 64-bit they are 8
byte long and user space knows that and generates 8-byte aligned
addresses. So you have to divide the address by 8 to calculate the CPU
register index correctly, which compat_arch_ptrace() currently doesn't.

Another difference is that on 64-bit `long` is 8 bytes, so user space
can read a whole FPU register in a single ptrace call.

Now that we are all aware of compat_arch_ptrace() (which handles the
special case of a 32-bit process running on a 64-bit kernel) I would say
the patch is correct and does the right thing for both 32-bit and 64-bit
kernels and processes.


The challenge will be unpicking the awful mess of ifdefs in ptrace.c
and making it somewhat more comprehensible.


I'm not sure what ifdefs you're thinking about. The only that are used
inside arch_ptrace() are PT_FPR0, PT_FPSCR and TS_FPR, which seem to be
correct.

But perhaps it would be useful to change my patch and add a comment just
before arch_ptrace() that explains how the math is done and that the
code must work on both 32-bit and 64-bit, the user space address
assumptions, etc.

By the way, I'm not sure the code in compat_arch_ptrace() handles
PT_FPSCR correctly. It might (just because fpscr is right next to fpr[]
in memory - and that's a hack), but I can't figure out if it accesses
the right half.




Does the issue still exists ? If yes, the patch has to be rebased.