date:20200107

Re: [PATCH v2 01/11] powerpc/powernv/ioda: Fix ref count for devices with their own PE

2020-01-07 Thread Andrew Donnellan


On 22/11/19 12:49 am, Frederic Barrat wrote:

The pci_dn structure used to store a pointer to the struct pci_dev, so
taking a reference on the device was required. However, the pci_dev
pointer was later removed from the pci_dn structure, but the reference
was kept for the npu device.
See commit 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary
pcidev from pci_dn").

We don't need to take a reference on the device when assigning the PE
as the struct pnv_ioda_pe is cleaned up at the same time as
the (physical) device is released. Doing so prevents the device from
being released, which is a problem for opencapi devices, since we want
to be able to remove them through PCI hotplug.

Now the ugly part: nvlink npu devices are not meant to be
released. Because of the above, we've always leaked a reference and
simply removing it now is dangerous and would likely require more
work. There's currently no release device callback for nvlink devices
for example. So to be safe, this patch leaks a reference on the npu
device, but only for nvlink and not opencapi.

CC: a...@ozlabs.ru
CC: ooh...@gmail.com
Signed-off-by: Frederic Barrat 


It took me a while to parse exactly what you're doing here.

- In pnv_ioda_setup_dev_PE(), we take a reference on the pci_dev, this 
is to protect a pointer in the pci_dn structure, but not to protect the 
pointer in the pci_dev structure (which doesn't need to be protected by 
taking a reference, because the lifetime of the pnv_ioda_pe is the same 
as the pci_dev).


- The pointer in the pci_dn structure has now been removed, so we should 
remove the pci_dev_get() accordingly.


This seems okay to me, though anything PE-related is irritatingly hairy 
so one can never be truly certain...


Reviewed-by: Andrew Donnellan 

--
Andrew Donnellan  OzLabs, ADL Canberra
a...@linux.ibm.com IBM Australia Limited

[PATCH] powerpc/papr_scm: Don't enable direct map for a region by default

2020-01-07 Thread Aneesh Kumar K.V

Setting ND_REGION_PAGEMAP flag implies namespace mode defaults to fsdax mode.
This also means kernel ends up creating struct page backing for these namspace
ranges. With large namespaces that is not the right thing to do. We
should let the user select the mode he/she wants the namespace to be created
with.

Hence disable ND_REGION_PAGEMAP for papr_scm regions. We still keep the flag for
of_pmem because it supports only small persistent memory regions.

This is similar to what is done for x86 with commit
commit: 004f1afbe199 ("libnvdimm, pmem: direct map legacy pmem by default")

Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/platforms/pseries/papr_scm.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/papr_scm.c 
b/arch/powerpc/platforms/pseries/papr_scm.c
index 26a5ef263758..ffcd0d7a867c 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -356,7 +356,6 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
ndr_desc.mapping = 
ndr_desc.num_mappings = 1;
ndr_desc.nd_set = >nd_set;
-   set_bit(ND_REGION_PAGEMAP, _desc.flags);
 
if (p->is_volatile)
p->region = nvdimm_volatile_region_create(p->bus, _desc);
-- 
2.24.1

Re: [PATCH v3] powerpc/kernel/sysfs: Add new config option PMU_SYSFS to enable PMU SPRs sysfs file creation

2020-01-07 Thread kajoljain


Hi Michael,

On 1/7/20 4:40 PM, Michael Ellerman wrote:

Kajol Jain  writes:

diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index 12543e53fa96..58c72b9b8902 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -417,6 +417,12 @@ config PPC_MM_SLICES
  config PPC_HAVE_PMU_SUPPORT
 bool
  
+config PMU_SYSFS

+   bool
+   default n
+   help
+ This option enables sysfs file creation for PMU SPRs like MMCR* and 
PMC*.

This isn't user-selectable. It needs a description after `bool` to be
user-selectable.



Yes you are right. My bad, i was trying by manually enabling/disabling 
PMU_SYSFS. Will update accordingly.


Thanks,

Kajol


cheers

[PATCH] powerpc/mm/hash: Fix the min context value used by userspace.

2020-01-07 Thread Aneesh Kumar K.V

Without this kernel can endup with SLB entries as below

04 c00c0800 00066bde000a7510 256M ESID=c00c0  VSID=   66bde000a7 
LLP:110
12 0800 00066bde000a7d90 256M ESID=0  VSID=   66bde000a7 
LLP:110

Such SLB entries can result in machine check.

We noticed this with 256MB segments because that resulted in the duplicate VSID
with first vmemmap segment and first user segement. With 1TB segments we observe
duplication with EAs like

0x100e64b vsid for EA 0xc00db500 context 7
0x100e64b vsid for user EA 0x1b500 context 7

and those high addresses are not common and the kernel mapping in the above case
is I/O remap range.

[0.00] vmalloc start = 0xc008
[0.00] IO start  = 0xc00a
[0.00] vmemmap start = 0xc00c

Fixes: 0034d395f89d ("powerpc/mm/hash64: Map all the kernel regions in the same 
0xc range")
Signed-off-by: Aneesh Kumar K.V 
---
 arch/powerpc/include/asm/book3s/64/mmu-hash.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 15b75005bc34..516db8a2e6ca 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -601,7 +601,7 @@ extern void slb_set_size(u16 size);
  */
 #define MAX_USER_CONTEXT   ((ASM_CONST(1) << CONTEXT_BITS) - 2)
 #define MIN_USER_CONTEXT   (MAX_KERNEL_CTX_CNT + MAX_VMALLOC_CTX_CNT + \
-MAX_IO_CTX_CNT + MAX_VMEMMAP_CTX_CNT)
+MAX_IO_CTX_CNT + MAX_VMEMMAP_CTX_CNT + 1)
 /*
  * For platforms that support on 65bit VA we limit the context bits
  */
-- 
2.24.1

[Bug 206049] alg: skcipher: p8_aes_xts encryption unexpectedly succeeded on test vector "random: len=0 klen=64"; expected_error=-22, cfg="random: inplace may_sleep use_finup src_divs=[66.99%@+1

2020-01-07 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=206049

--- Comment #6 from Daniel Axtens (d...@axtens.net) ---
Patch sent: https://patchwork.ozlabs.org/patch/1219350/

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

[PATCH] crypto: vmx/xts - reject inputs that are too short

2020-01-07 Thread Daniel Axtens

When the kernel XTS implementation was extended to deal with ciphertext
stealing in commit 8083b1bf8163 ("crypto: xts - add support for ciphertext
stealing"), a check was added to reject inputs that were too short.

However, in the vmx enablement - commit 239668419349 ("crypto: vmx/xts -
use fallback for ciphertext stealing"), that check wasn't added to the
vmx implementation. This disparity leads to errors like the following:

alg: skcipher: p8_aes_xts encryption unexpectedly succeeded on test vector 
"random: len=0 klen=64"; expected_error=-22, cfg="random: inplace may_sleep 
use_finup src_divs=[66.99%@+10, 33.1%@alignmask+1155]"

Return -EINVAL if asked to operate with a cryptlen smaller than the AES
block size. This brings vmx in line with the generic implementation.

Reported-by: Erhard Furtner 
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206049
Fixes: 239668419349 ("crypto: vmx/xts - use fallback for ciphertext stealing")
Cc: Ard Biesheuvel 
Cc: sta...@vger.kernel.org # v5.4+
Signed-off-by: Michael Ellerman 
[dja: commit message]
Signed-off-by: Daniel Axtens 
---
 drivers/crypto/vmx/aes_xts.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/crypto/vmx/aes_xts.c b/drivers/crypto/vmx/aes_xts.c
index d59e736882f6..9fee1b1532a4 100644
--- a/drivers/crypto/vmx/aes_xts.c
+++ b/drivers/crypto/vmx/aes_xts.c
@@ -84,6 +84,9 @@ static int p8_aes_xts_crypt(struct skcipher_request *req, int 
enc)
u8 tweak[AES_BLOCK_SIZE];
int ret;
 
+   if (req->cryptlen < AES_BLOCK_SIZE)
+   return -EINVAL;
+
if (!crypto_simd_usable() || (req->cryptlen % XTS_BLOCK_SIZE) != 0) {
struct skcipher_request *subreq = skcipher_request_ctx(req);
 
-- 
2.20.1

[PATCH v12 16/22] fs/io_uring: set FOLL_PIN via pin_user_pages()

2020-01-07 Thread John Hubbard

Convert fs/io_uring to use the new pin_user_pages() call, which sets
FOLL_PIN. Setting FOLL_PIN is now required for code that requires
tracking of pinned pages, and therefore for any code that calls
put_user_page().

In partial anticipation of this work, the io_uring code was already
calling put_user_page() instead of put_page(). Therefore, in order to
convert from the get_user_pages()/put_page() model, to the
pin_user_pages()/put_user_page() model, the only change required
here is to change get_user_pages() to pin_user_pages().

Reviewed-by: Jens Axboe 
Reviewed-by: Jan Kara 
Signed-off-by: John Hubbard 
---
 fs/io_uring.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/io_uring.c b/fs/io_uring.c
index 562e3a1a1bf9..9f804cb25c61 100644
--- a/fs/io_uring.c
+++ b/fs/io_uring.c
@@ -4815,7 +4815,7 @@ static int io_sqe_buffer_register(struct io_ring_ctx 
*ctx, void __user *arg,
 
ret = 0;
down_read(>mm->mmap_sem);
-   pret = get_user_pages(ubuf, nr_pages,
+   pret = pin_user_pages(ubuf, nr_pages,
  FOLL_WRITE | FOLL_LONGTERM,
  pages, vmas);
if (pret == nr_pages) {
-- 
2.24.1

[PATCH v12 14/22] mm/process_vm_access: set FOLL_PIN via pin_user_pages_remote()

2020-01-07 Thread John Hubbard

Convert process_vm_access to use the new pin_user_pages_remote()
call, which sets FOLL_PIN. Setting FOLL_PIN is now required for
code that requires tracking of pinned pages.

Also, release the pages via put_user_page*().

Also, rename "pages" to "pinned_pages", as this makes for
easier reading of process_vm_rw_single_vec().

Reviewed-by: Jan Kara 
Reviewed-by: Jérôme Glisse 
Reviewed-by: Ira Weiny 
Signed-off-by: John Hubbard 
---
 mm/process_vm_access.c | 28 +++-
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/mm/process_vm_access.c b/mm/process_vm_access.c
index 357aa7bef6c0..fd20ab675b85 100644
--- a/mm/process_vm_access.c
+++ b/mm/process_vm_access.c
@@ -42,12 +42,11 @@ static int process_vm_rw_pages(struct page **pages,
if (copy > len)
copy = len;
 
-   if (vm_write) {
+   if (vm_write)
copied = copy_page_from_iter(page, offset, copy, iter);
-   set_page_dirty_lock(page);
-   } else {
+   else
copied = copy_page_to_iter(page, offset, copy, iter);
-   }
+
len -= copied;
if (copied < copy && iov_iter_count(iter))
return -EFAULT;
@@ -96,7 +95,7 @@ static int process_vm_rw_single_vec(unsigned long addr,
flags |= FOLL_WRITE;
 
while (!rc && nr_pages && iov_iter_count(iter)) {
-   int pages = min(nr_pages, max_pages_per_loop);
+   int pinned_pages = min(nr_pages, max_pages_per_loop);
int locked = 1;
size_t bytes;
 
@@ -106,14 +105,15 @@ static int process_vm_rw_single_vec(unsigned long addr,
 * current/current->mm
 */
down_read(>mmap_sem);
-   pages = get_user_pages_remote(task, mm, pa, pages, flags,
- process_pages, NULL, );
+   pinned_pages = pin_user_pages_remote(task, mm, pa, pinned_pages,
+flags, process_pages,
+NULL, );
if (locked)
up_read(>mmap_sem);
-   if (pages <= 0)
+   if (pinned_pages <= 0)
return -EFAULT;
 
-   bytes = pages * PAGE_SIZE - start_offset;
+   bytes = pinned_pages * PAGE_SIZE - start_offset;
if (bytes > len)
bytes = len;
 
@@ -122,10 +122,12 @@ static int process_vm_rw_single_vec(unsigned long addr,
 vm_write);
len -= bytes;
start_offset = 0;
-   nr_pages -= pages;
-   pa += pages * PAGE_SIZE;
-   while (pages)
-   put_page(process_pages[--pages]);
+   nr_pages -= pinned_pages;
+   pa += pinned_pages * PAGE_SIZE;
+
+   /* If vm_write is set, the pages need to be made dirty: */
+   put_user_pages_dirty_lock(process_pages, pinned_pages,
+ vm_write);
}
 
return rc;
-- 
2.24.1

[Bug 206049] alg: skcipher: p8_aes_xts encryption unexpectedly succeeded on test vector "random: len=0 klen=64"; expected_error=-22, cfg="random: inplace may_sleep use_finup src_divs=[66.99%@+1

2020-01-07 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=206049

--- Comment #5 from Erhard F. (erhar...@mailbox.org) ---
(In reply to Michael Ellerman from comment #3)
> Looks like other implementations check the size, can you try this:
> 
> diff --git a/drivers/crypto/vmx/aes_xts.c b/drivers/crypto/vmx/aes_xts.c
> index d59e736882f6..9fee1b1532a4 100644
> --- a/drivers/crypto/vmx/aes_xts.c
> +++ b/drivers/crypto/vmx/aes_xts.c
> @@ -84,6 +84,9 @@ static int p8_aes_xts_crypt(struct skcipher_request *req,
> int enc)
>   u8 tweak[AES_BLOCK_SIZE];
>   int ret;
>  
> + if (req->cryptlen < AES_BLOCK_SIZE)
> + return -EINVAL;
> +
>   if (!crypto_simd_usable() || (req->cryptlen % XTS_BLOCK_SIZE) != 0) {
>   struct skcipher_request *subreq = skcipher_request_ctx(req);
Your patch fixed it, thanks! Applied it on top of kernel 5.4.8 and the
p8_aes_xts error did not show up in subsequent reboots.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

[PATCH v12 09/22] IB/umem: use get_user_pages_fast() to pin DMA pages

2020-01-07 Thread John Hubbard

And get rid of the mmap_sem calls, as part of that. Note
that get_user_pages_fast() will, if necessary, fall back to
__gup_longterm_unlocked(), which takes the mmap_sem as needed.

Reviewed-by: Leon Romanovsky 
Reviewed-by: Christoph Hellwig 
Reviewed-by: Jan Kara 
Reviewed-by: Jason Gunthorpe 
Reviewed-by: Ira Weiny 
Signed-off-by: John Hubbard 
---
 drivers/infiniband/core/umem.c | 17 ++---
 1 file changed, 6 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 7a3b99597ead..214e87aa609d 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -266,16 +266,13 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, 
unsigned long addr,
sg = umem->sg_head.sgl;
 
while (npages) {
-   down_read(>mmap_sem);
-   ret = get_user_pages(cur_base,
-min_t(unsigned long, npages,
-  PAGE_SIZE / sizeof (struct page *)),
-gup_flags | FOLL_LONGTERM,
-page_list, NULL);
-   if (ret < 0) {
-   up_read(>mmap_sem);
+   ret = get_user_pages_fast(cur_base,
+ min_t(unsigned long, npages,
+   PAGE_SIZE /
+   sizeof(struct page *)),
+ gup_flags | FOLL_LONGTERM, page_list);
+   if (ret < 0)
goto umem_release;
-   }
 
cur_base += ret * PAGE_SIZE;
npages   -= ret;
@@ -283,8 +280,6 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, 
unsigned long addr,
sg = ib_umem_add_sg_table(sg, page_list, ret,
dma_get_max_seg_size(context->device->dma_device),
>sg_nents);
-
-   up_read(>mmap_sem);
}
 
sg_mark_end(sg);
-- 
2.24.1

[PATCH v12 11/22] mm/gup: introduce pin_user_pages*() and FOLL_PIN

2020-01-07 Thread John Hubbard

Introduce pin_user_pages*() variations of get_user_pages*() calls,
and also pin_longterm_pages*() variations.

For now, these are placeholder calls, until the various call sites
are converted to use the correct get_user_pages*() or
pin_user_pages*() API.

These variants will eventually all set FOLL_PIN, which is also
introduced, and thoroughly documented.

pin_user_pages()
pin_user_pages_remote()
pin_user_pages_fast()

All pages that are pinned via the above calls, must be unpinned via
put_user_page().

The underlying rules are:

* FOLL_PIN is a gup-internal flag, so the call sites should not directly
set it. That behavior is enforced with assertions.

* Call sites that want to indicate that they are going to do DirectIO
  ("DIO") or something with similar characteristics, should call a
  get_user_pages()-like wrapper call that sets FOLL_PIN. These wrappers
  will:
* Start with "pin_user_pages" instead of "get_user_pages". That
  makes it easy to find and audit the call sites.
* Set FOLL_PIN

* For pages that are received via FOLL_PIN, those pages must be returned
  via put_user_page().

Thanks to Jan Kara and Vlastimil Babka for explaining the 4 cases
in this documentation. (I've reworded it and expanded upon it.)

Reviewed-by: Jan Kara 
Reviewed-by: Mike Rapoport   # Documentation
Reviewed-by: Jérôme Glisse 
Cc: Jonathan Corbet 
Cc: Ira Weiny 
Signed-off-by: John Hubbard 
---
 Documentation/core-api/index.rst  |   1 +
 Documentation/core-api/pin_user_pages.rst | 232 ++
 include/linux/mm.h|  63 --
 mm/gup.c  | 164 +--
 4 files changed, 426 insertions(+), 34 deletions(-)
 create mode 100644 Documentation/core-api/pin_user_pages.rst

diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
index ab0eae1c153a..413f7d7c8642 100644
--- a/Documentation/core-api/index.rst
+++ b/Documentation/core-api/index.rst
@@ -31,6 +31,7 @@ Core utilities
generic-radix-tree
memory-allocation
mm-api
+   pin_user_pages
gfp_mask-from-fs-io
timekeeping
boot-time-mm
diff --git a/Documentation/core-api/pin_user_pages.rst 
b/Documentation/core-api/pin_user_pages.rst
new file mode 100644
index ..71849830cd48
--- /dev/null
+++ b/Documentation/core-api/pin_user_pages.rst
@@ -0,0 +1,232 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+
+pin_user_pages() and related calls
+
+
+.. contents:: :local:
+
+Overview
+
+
+This document describes the following functions::
+
+ pin_user_pages()
+ pin_user_pages_fast()
+ pin_user_pages_remote()
+
+Basic description of FOLL_PIN
+=
+
+FOLL_PIN and FOLL_LONGTERM are flags that can be passed to the 
get_user_pages*()
+("gup") family of functions. FOLL_PIN has significant interactions and
+interdependencies with FOLL_LONGTERM, so both are covered here.
+
+FOLL_PIN is internal to gup, meaning that it should not appear at the gup call
+sites. This allows the associated wrapper functions  (pin_user_pages*() and
+others) to set the correct combination of these flags, and to check for 
problems
+as well.
+
+FOLL_LONGTERM, on the other hand, *is* allowed to be set at the gup call sites.
+This is in order to avoid creating a large number of wrapper functions to cover
+all combinations of get*(), pin*(), FOLL_LONGTERM, and more. Also, the
+pin_user_pages*() APIs are clearly distinct from the get_user_pages*() APIs, so
+that's a natural dividing line, and a good point to make separate wrapper 
calls.
+In other words, use pin_user_pages*() for DMA-pinned pages, and
+get_user_pages*() for other cases. There are four cases described later on in
+this document, to further clarify that concept.
+
+FOLL_PIN and FOLL_GET are mutually exclusive for a given gup call. However,
+multiple threads and call sites are free to pin the same struct pages, via both
+FOLL_PIN and FOLL_GET. It's just the call site that needs to choose one or the
+other, not the struct page(s).
+
+The FOLL_PIN implementation is nearly the same as FOLL_GET, except that 
FOLL_PIN
+uses a different reference counting technique.
+
+FOLL_PIN is a prerequisite to FOLL_LONGTERM. Another way of saying that is,
+FOLL_LONGTERM is a specific case, more restrictive case of FOLL_PIN.
+
+Which flags are set by each wrapper
+===
+
+For these pin_user_pages*() functions, FOLL_PIN is OR'd in with whatever gup
+flags the caller provides. The caller is required to pass in a non-null struct
+pages* array, and the function then pin pages by incrementing each by a special
+value. For now, that value is +1, just like get_user_pages*().::
+
+ Function
+ 
+ pin_user_pages  FOLL_PIN is always set internally by this function.
+ pin_user_pages_fast FOLL_PIN is always set internally by this function.
+

[PATCH v12 22/22] mm, tree-wide: rename put_user_page() to unpin_user_page()

2020-01-07 Thread John Hubbard

In order to provide a clearer, more symmetric API for pinning
and unpinning DMA pages. This way, pin_user_pages*() calls
match up with unpin_user_pages*() calls, and the API is a lot
closer to being self-explanatory.

Reviewed-by: Jan Kara 
Signed-off-by: John Hubbard 
---
 Documentation/core-api/pin_user_pages.rst   |  2 +-
 arch/powerpc/mm/book3s64/iommu_api.c|  4 +--
 drivers/gpu/drm/via/via_dmablit.c   |  4 +--
 drivers/infiniband/core/umem.c  |  2 +-
 drivers/infiniband/hw/hfi1/user_pages.c |  2 +-
 drivers/infiniband/hw/mthca/mthca_memfree.c |  6 ++--
 drivers/infiniband/hw/qib/qib_user_pages.c  |  2 +-
 drivers/infiniband/hw/qib/qib_user_sdma.c   |  6 ++--
 drivers/infiniband/hw/usnic/usnic_uiom.c|  2 +-
 drivers/infiniband/sw/siw/siw_mem.c |  2 +-
 drivers/media/v4l2-core/videobuf-dma-sg.c   |  4 +--
 drivers/platform/goldfish/goldfish_pipe.c   |  4 +--
 drivers/vfio/vfio_iommu_type1.c |  2 +-
 fs/io_uring.c   |  4 +--
 include/linux/mm.h  | 26 -
 mm/gup.c| 32 ++---
 mm/process_vm_access.c  |  4 +--
 net/xdp/xdp_umem.c  |  2 +-
 18 files changed, 55 insertions(+), 55 deletions(-)

diff --git a/Documentation/core-api/pin_user_pages.rst 
b/Documentation/core-api/pin_user_pages.rst
index 71849830cd48..1d490155ecd7 100644
--- a/Documentation/core-api/pin_user_pages.rst
+++ b/Documentation/core-api/pin_user_pages.rst
@@ -219,7 +219,7 @@ since the system was booted, via two new /proc/vmstat 
entries: ::
 /proc/vmstat/nr_foll_pin_requested
 
 Those are both going to show zero, unless CONFIG_DEBUG_VM is set. This is
-because there is a noticeable performance drop in put_user_page(), when they
+because there is a noticeable performance drop in unpin_user_page(), when they
 are activated.
 
 References
diff --git a/arch/powerpc/mm/book3s64/iommu_api.c 
b/arch/powerpc/mm/book3s64/iommu_api.c
index a86547822034..eba73ebd8ae5 100644
--- a/arch/powerpc/mm/book3s64/iommu_api.c
+++ b/arch/powerpc/mm/book3s64/iommu_api.c
@@ -168,7 +168,7 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, 
unsigned long ua,
 
 free_exit:
/* free the references taken */
-   put_user_pages(mem->hpages, pinned);
+   unpin_user_pages(mem->hpages, pinned);
 
vfree(mem->hpas);
kfree(mem);
@@ -214,7 +214,7 @@ static void mm_iommu_unpin(struct 
mm_iommu_table_group_mem_t *mem)
if (mem->hpas[i] & MM_IOMMU_TABLE_GROUP_PAGE_DIRTY)
SetPageDirty(page);
 
-   put_user_page(page);
+   unpin_user_page(page);
 
mem->hpas[i] = 0;
}
diff --git a/drivers/gpu/drm/via/via_dmablit.c 
b/drivers/gpu/drm/via/via_dmablit.c
index 37c5e572993a..719d036c9384 100644
--- a/drivers/gpu/drm/via/via_dmablit.c
+++ b/drivers/gpu/drm/via/via_dmablit.c
@@ -188,8 +188,8 @@ via_free_sg_info(struct pci_dev *pdev, drm_via_sg_info_t 
*vsg)
kfree(vsg->desc_pages);
/* fall through */
case dr_via_pages_locked:
-   put_user_pages_dirty_lock(vsg->pages, vsg->num_pages,
- (vsg->direction == DMA_FROM_DEVICE));
+   unpin_user_pages_dirty_lock(vsg->pages, vsg->num_pages,
+  (vsg->direction == DMA_FROM_DEVICE));
/* fall through */
case dr_via_pages_alloc:
vfree(vsg->pages);
diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 55daefaa9b88..a6094766b6f5 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -54,7 +54,7 @@ static void __ib_umem_release(struct ib_device *dev, struct 
ib_umem *umem, int d
 
for_each_sg_page(umem->sg_head.sgl, _iter, umem->sg_nents, 0) {
page = sg_page_iter_page(_iter);
-   put_user_pages_dirty_lock(, 1, umem->writable && dirty);
+   unpin_user_pages_dirty_lock(, 1, umem->writable && dirty);
}
 
sg_free_table(>sg_head);
diff --git a/drivers/infiniband/hw/hfi1/user_pages.c 
b/drivers/infiniband/hw/hfi1/user_pages.c
index 9a94761765c0..3b505006c0a6 100644
--- a/drivers/infiniband/hw/hfi1/user_pages.c
+++ b/drivers/infiniband/hw/hfi1/user_pages.c
@@ -118,7 +118,7 @@ int hfi1_acquire_user_pages(struct mm_struct *mm, unsigned 
long vaddr, size_t np
 void hfi1_release_user_pages(struct mm_struct *mm, struct page **p,
 size_t npages, bool dirty)
 {
-   put_user_pages_dirty_lock(p, npages, dirty);
+   unpin_user_pages_dirty_lock(p, npages, dirty);
 
if (mm) { /* during close after signal, mm can be NULL */
atomic64_sub(npages, >pinned_vm);
diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.c 
b/drivers/infiniband/hw/mthca/mthca_memfree.c
index

[PATCH v12 08/22] mm/gup: allow FOLL_FORCE for get_user_pages_fast()

2020-01-07 Thread John Hubbard

Commit 817be129e6f2 ("mm: validate get_user_pages_fast flags") allowed
only FOLL_WRITE and FOLL_LONGTERM to be passed to get_user_pages_fast().
This, combined with the fact that get_user_pages_fast() falls back to
"slow gup", which *does* accept FOLL_FORCE, leads to an odd situation:
if you need FOLL_FORCE, you cannot call get_user_pages_fast().

There does not appear to be any reason for filtering out FOLL_FORCE.
There is nothing in the _fast() implementation that requires that we
avoid writing to the pages. So it appears to have been an oversight.

Fix by allowing FOLL_FORCE to be set for get_user_pages_fast().

Fixes: 817be129e6f2 ("mm: validate get_user_pages_fast flags")
Cc: Christoph Hellwig 
Reviewed-by: Leon Romanovsky 
Reviewed-by: Jan Kara 
Signed-off-by: John Hubbard 
---
 mm/gup.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/gup.c b/mm/gup.c
index b61bd5c469ae..a594bc708367 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -2411,7 +2411,8 @@ int get_user_pages_fast(unsigned long start, int nr_pages,
unsigned long addr, len, end;
int nr = 0, ret = 0;
 
-   if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM)))
+   if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM |
+  FOLL_FORCE)))
return -EINVAL;
 
start = untagged_addr(start) & PAGE_MASK;
-- 
2.24.1

[PATCH v12 06/22] mm: fix get_user_pages_remote()'s handling of FOLL_LONGTERM

2020-01-07 Thread John Hubbard

As it says in the updated comment in gup.c: current FOLL_LONGTERM
behavior is incompatible with FAULT_FLAG_ALLOW_RETRY because of the
FS DAX check requirement on vmas.

However, the corresponding restriction in get_user_pages_remote() was
slightly stricter than is actually required: it forbade all
FOLL_LONGTERM callers, but we can actually allow FOLL_LONGTERM callers
that do not set the "locked" arg.

Update the code and comments to loosen the restriction, allowing
FOLL_LONGTERM in some cases.

Also, copy the DAX check ("if a VMA is DAX, don't allow long term
pinning") from the VFIO call site, all the way into the internals
of get_user_pages_remote() and __gup_longterm_locked(). That is:
get_user_pages_remote() calls __gup_longterm_locked(), which in turn
calls check_dax_vmas(). This check will then be removed from the VFIO
call site in a subsequent patch.

Thanks to Jason Gunthorpe for pointing out a clean way to fix this,
and to Dan Williams for helping clarify the DAX refactoring.

Tested-by: Alex Williamson 
Acked-by: Alex Williamson 
Reviewed-by: Jason Gunthorpe 
Reviewed-by: Ira Weiny 
Suggested-by: Jason Gunthorpe 
Cc: Kirill A. Shutemov 
Cc: Dan Williams 
Cc: Jerome Glisse 
Signed-off-by: John Hubbard 
---
 mm/gup.c | 174 +--
 1 file changed, 92 insertions(+), 82 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 5938e29a5a8b..b61bd5c469ae 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -,88 +,6 @@ static __always_inline long 
__get_user_pages_locked(struct task_struct *tsk,
return pages_done;
 }
 
-/*
- * get_user_pages_remote() - pin user pages in memory
- * @tsk:   the task_struct to use for page fault accounting, or
- * NULL if faults are not to be recorded.
- * @mm:mm_struct of target mm
- * @start: starting user address
- * @nr_pages:  number of pages from start to pin
- * @gup_flags: flags modifying lookup behaviour
- * @pages: array that receives pointers to the pages pinned.
- * Should be at least nr_pages long. Or NULL, if caller
- * only intends to ensure the pages are faulted in.
- * @vmas:  array of pointers to vmas corresponding to each page.
- * Or NULL if the caller does not require them.
- * @locked:pointer to lock flag indicating whether lock is held and
- * subsequently whether VM_FAULT_RETRY functionality can be
- * utilised. Lock must initially be held.
- *
- * Returns either number of pages pinned (which may be less than the
- * number requested), or an error. Details about the return value:
- *
- * -- If nr_pages is 0, returns 0.
- * -- If nr_pages is >0, but no pages were pinned, returns -errno.
- * -- If nr_pages is >0, and some pages were pinned, returns the number of
- *pages pinned. Again, this may be less than nr_pages.
- *
- * The caller is responsible for releasing returned @pages, via put_page().
- *
- * @vmas are valid only as long as mmap_sem is held.
- *
- * Must be called with mmap_sem held for read or write.
- *
- * get_user_pages walks a process's page tables and takes a reference to
- * each struct page that each user address corresponds to at a given
- * instant. That is, it takes the page that would be accessed if a user
- * thread accesses the given user virtual address at that instant.
- *
- * This does not guarantee that the page exists in the user mappings when
- * get_user_pages returns, and there may even be a completely different
- * page there in some cases (eg. if mmapped pagecache has been invalidated
- * and subsequently re faulted). However it does guarantee that the page
- * won't be freed completely. And mostly callers simply care that the page
- * contains data that was valid *at some point in time*. Typically, an IO
- * or similar operation cannot guarantee anything stronger anyway because
- * locks can't be held over the syscall boundary.
- *
- * If gup_flags & FOLL_WRITE == 0, the page must not be written to. If the page
- * is written to, set_page_dirty (or set_page_dirty_lock, as appropriate) must
- * be called after the page is finished with, and before put_page is called.
- *
- * get_user_pages is typically used for fewer-copy IO operations, to get a
- * handle on the memory by some means other than accesses via the user virtual
- * addresses. The pages may be submitted for DMA to devices or accessed via
- * their kernel linear mapping (via the kmap APIs). Care should be taken to
- * use the correct cache flushing APIs.
- *
- * See also get_user_pages_fast, for performance critical applications.
- *
- * get_user_pages should be phased out in favor of
- * get_user_pages_locked|unlocked or get_user_pages_fast. Nothing
- * should use get_user_pages because it cannot pass
- * FAULT_FLAG_ALLOW_RETRY to handle_mm_fault.
- */
-long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
-   unsigned long start, unsigned long nr_pages,
-

[PATCH v12 13/22] IB/{core, hw, umem}: set FOLL_PIN via pin_user_pages*(), fix up ODP

2020-01-07 Thread John Hubbard

Convert infiniband to use the new pin_user_pages*() calls.

Also, revert earlier changes to Infiniband ODP that had it using
put_user_page(). ODP is "Case 3" in
Documentation/core-api/pin_user_pages.rst, which is to say, normal
get_user_pages() and put_page() is the API to use there.

The new pin_user_pages*() calls replace corresponding get_user_pages*()
calls, and set the FOLL_PIN flag. The FOLL_PIN flag requires that the
caller must return the pages via put_user_page*() calls, but infiniband
was already doing that as part of an earlier commit.

Reviewed-by: Jason Gunthorpe 
Signed-off-by: John Hubbard 
---
 drivers/infiniband/core/umem.c  |  2 +-
 drivers/infiniband/core/umem_odp.c  | 13 ++---
 drivers/infiniband/hw/hfi1/user_pages.c |  2 +-
 drivers/infiniband/hw/mthca/mthca_memfree.c |  2 +-
 drivers/infiniband/hw/qib/qib_user_pages.c  |  2 +-
 drivers/infiniband/hw/qib/qib_user_sdma.c   |  2 +-
 drivers/infiniband/hw/usnic/usnic_uiom.c|  2 +-
 drivers/infiniband/sw/siw/siw_mem.c |  2 +-
 8 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 214e87aa609d..55daefaa9b88 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -266,7 +266,7 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, 
unsigned long addr,
sg = umem->sg_head.sgl;
 
while (npages) {
-   ret = get_user_pages_fast(cur_base,
+   ret = pin_user_pages_fast(cur_base,
  min_t(unsigned long, npages,
PAGE_SIZE /
sizeof(struct page *)),
diff --git a/drivers/infiniband/core/umem_odp.c 
b/drivers/infiniband/core/umem_odp.c
index e42d44e501fd..abc3bb6578cc 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -308,9 +308,8 @@ EXPORT_SYMBOL(ib_umem_odp_release);
  * The function returns -EFAULT if the DMA mapping operation fails. It returns
  * -EAGAIN if a concurrent invalidation prevents us from updating the page.
  *
- * The page is released via put_user_page even if the operation failed. For
- * on-demand pinning, the page is released whenever it isn't stored in the
- * umem.
+ * The page is released via put_page even if the operation failed. For 
on-demand
+ * pinning, the page is released whenever it isn't stored in the umem.
  */
 static int ib_umem_odp_map_dma_single_page(
struct ib_umem_odp *umem_odp,
@@ -363,7 +362,7 @@ static int ib_umem_odp_map_dma_single_page(
}
 
 out:
-   put_user_page(page);
+   put_page(page);
return ret;
 }
 
@@ -473,7 +472,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, 
u64 user_virt,
ret = -EFAULT;
break;
}
-   put_user_page(local_page_list[j]);
+   put_page(local_page_list[j]);
continue;
}
 
@@ -500,8 +499,8 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, 
u64 user_virt,
 * ib_umem_odp_map_dma_single_page().
 */
if (npages - (j + 1) > 0)
-   put_user_pages(_page_list[j+1],
-  npages - (j + 1));
+   release_pages(_page_list[j+1],
+ npages - (j + 1));
break;
}
}
diff --git a/drivers/infiniband/hw/hfi1/user_pages.c 
b/drivers/infiniband/hw/hfi1/user_pages.c
index 469acb961fbd..9a94761765c0 100644
--- a/drivers/infiniband/hw/hfi1/user_pages.c
+++ b/drivers/infiniband/hw/hfi1/user_pages.c
@@ -106,7 +106,7 @@ int hfi1_acquire_user_pages(struct mm_struct *mm, unsigned 
long vaddr, size_t np
int ret;
unsigned int gup_flags = FOLL_LONGTERM | (writable ? FOLL_WRITE : 0);
 
-   ret = get_user_pages_fast(vaddr, npages, gup_flags, pages);
+   ret = pin_user_pages_fast(vaddr, npages, gup_flags, pages);
if (ret < 0)
return ret;
 
diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.c 
b/drivers/infiniband/hw/mthca/mthca_memfree.c
index edccfd6e178f..8269ab040c21 100644
--- a/drivers/infiniband/hw/mthca/mthca_memfree.c
+++ b/drivers/infiniband/hw/mthca/mthca_memfree.c
@@ -472,7 +472,7 @@ int mthca_map_user_db(struct mthca_dev *dev, struct 
mthca_uar *uar,
goto out;
}
 
-   ret = get_user_pages_fast(uaddr & PAGE_MASK, 1,
+   ret = pin_user_pages_fast(uaddr & PAGE_MASK, 1,
  FOLL_WRITE | FOLL_LONGTERM, pages);
if (ret < 0)
goto out;
diff --git

[PATCH v12 17/22] net/xdp: set FOLL_PIN via pin_user_pages()

2020-01-07 Thread John Hubbard

Convert net/xdp to use the new pin_longterm_pages() call, which sets
FOLL_PIN. Setting FOLL_PIN is now required for code that requires
tracking of pinned pages.

In partial anticipation of this work, the net/xdp code was already
calling put_user_page() instead of put_page(). Therefore, in order to
convert from the get_user_pages()/put_page() model, to the
pin_user_pages()/put_user_page() model, the only change required
here is to change get_user_pages() to pin_user_pages().

Acked-by: Björn Töpel 
Signed-off-by: John Hubbard 
---
 net/xdp/xdp_umem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index 3049af269fbf..d071003b5e76 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -291,7 +291,7 @@ static int xdp_umem_pin_pages(struct xdp_umem *umem)
return -ENOMEM;
 
down_read(>mm->mmap_sem);
-   npgs = get_user_pages(umem->address, umem->npgs,
+   npgs = pin_user_pages(umem->address, umem->npgs,
  gup_flags | FOLL_LONGTERM, >pgs[0], NULL);
up_read(>mm->mmap_sem);
 
-- 
2.24.1

[PATCH v12 18/22] media/v4l2-core: pin_user_pages (FOLL_PIN) and put_user_page() conversion

2020-01-07 Thread John Hubbard

1. Change v4l2 from get_user_pages() to pin_user_pages().

2. Because all FOLL_PIN-acquired pages must be released via
put_user_page(), also convert the put_page() call over to
put_user_pages_dirty_lock().

Acked-by: Hans Verkuil 
Cc: Ira Weiny 
Signed-off-by: John Hubbard 
---
 drivers/media/v4l2-core/videobuf-dma-sg.c | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c 
b/drivers/media/v4l2-core/videobuf-dma-sg.c
index 28262190c3ab..162a2633b1e3 100644
--- a/drivers/media/v4l2-core/videobuf-dma-sg.c
+++ b/drivers/media/v4l2-core/videobuf-dma-sg.c
@@ -183,12 +183,12 @@ static int videobuf_dma_init_user_locked(struct 
videobuf_dmabuf *dma,
dprintk(1, "init user [0x%lx+0x%lx => %d pages]\n",
data, size, dma->nr_pages);
 
-   err = get_user_pages(data & PAGE_MASK, dma->nr_pages,
+   err = pin_user_pages(data & PAGE_MASK, dma->nr_pages,
 flags | FOLL_LONGTERM, dma->pages, NULL);
 
if (err != dma->nr_pages) {
dma->nr_pages = (err >= 0) ? err : 0;
-   dprintk(1, "get_user_pages: err=%d [%d]\n", err,
+   dprintk(1, "pin_user_pages: err=%d [%d]\n", err,
dma->nr_pages);
return err < 0 ? err : -EINVAL;
}
@@ -349,11 +349,8 @@ int videobuf_dma_free(struct videobuf_dmabuf *dma)
BUG_ON(dma->sglen);
 
if (dma->pages) {
-   for (i = 0; i < dma->nr_pages; i++) {
-   if (dma->direction == DMA_FROM_DEVICE)
-   set_page_dirty_lock(dma->pages[i]);
-   put_page(dma->pages[i]);
-   }
+   put_user_pages_dirty_lock(dma->pages, dma->nr_pages,
+ dma->direction == DMA_FROM_DEVICE);
kfree(dma->pages);
dma->pages = NULL;
}
-- 
2.24.1

[PATCH v12 07/22] vfio: fix FOLL_LONGTERM use, simplify get_user_pages_remote() call

2020-01-07 Thread John Hubbard

Update VFIO to take advantage of the recently loosened restriction on
FOLL_LONGTERM with get_user_pages_remote(). Also, now it is possible to
fix a bug: the VFIO caller is logically a FOLL_LONGTERM user, but it
wasn't setting FOLL_LONGTERM.

Also, remove an unnessary pair of calls that were releasing and
reacquiring the mmap_sem. There is no need to avoid holding mmap_sem
just in order to call page_to_pfn().

Also, now that the the DAX check ("if a VMA is DAX, don't allow long
term pinning") is in the internals of get_user_pages_remote() and
__gup_longterm_locked(), there's no need for it at the VFIO call site.
So remove it.

Tested-by: Alex Williamson 
Acked-by: Alex Williamson 
Reviewed-by: Jason Gunthorpe 
Reviewed-by: Ira Weiny 
Suggested-by: Jason Gunthorpe 
Cc: Dan Williams 
Cc: Jerome Glisse 
Signed-off-by: John Hubbard 
---
 drivers/vfio/vfio_iommu_type1.c | 30 +-
 1 file changed, 5 insertions(+), 25 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 2ada8e6cdb88..b800fc9a0251 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -322,7 +322,6 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned 
long vaddr,
 {
struct page *page[1];
struct vm_area_struct *vma;
-   struct vm_area_struct *vmas[1];
unsigned int flags = 0;
int ret;
 
@@ -330,33 +329,14 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned 
long vaddr,
flags |= FOLL_WRITE;
 
down_read(>mmap_sem);
-   if (mm == current->mm) {
-   ret = get_user_pages(vaddr, 1, flags | FOLL_LONGTERM, page,
-vmas);
-   } else {
-   ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page,
-   vmas, NULL);
-   /*
-* The lifetime of a vaddr_get_pfn() page pin is
-* userspace-controlled. In the fs-dax case this could
-* lead to indefinite stalls in filesystem operations.
-* Disallow attempts to pin fs-dax pages via this
-* interface.
-*/
-   if (ret > 0 && vma_is_fsdax(vmas[0])) {
-   ret = -EOPNOTSUPP;
-   put_page(page[0]);
-   }
-   }
-   up_read(>mmap_sem);
-
+   ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM,
+   page, NULL, NULL);
if (ret == 1) {
*pfn = page_to_pfn(page[0]);
-   return 0;
+   ret = 0;
+   goto done;
}
 
-   down_read(>mmap_sem);
-
vaddr = untagged_addr(vaddr);
 
vma = find_vma_intersection(mm, vaddr, vaddr + 1);
@@ -366,7 +346,7 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned 
long vaddr,
if (is_invalid_reserved_pfn(*pfn))
ret = 0;
}
-
+done:
up_read(>mmap_sem);
return ret;
 }
-- 
2.24.1

[PATCH v12 04/22] mm: devmap: refactor 1-based refcounting for ZONE_DEVICE pages

2020-01-07 Thread John Hubbard

An upcoming patch changes and complicates the refcounting and
especially the "put page" aspects of it. In order to keep
everything clean, refactor the devmap page release routines:

* Rename put_devmap_managed_page() to page_is_devmap_managed(),
  and limit the functionality to "read only": return a bool,
  with no side effects.

* Add a new routine, put_devmap_managed_page(), to handle
  decrementing the refcount for ZONE_DEVICE pages.

* Change callers (just release_pages() and put_page()) to check
  page_is_devmap_managed() before calling the new
  put_devmap_managed_page() routine. This is a performance
  point: put_page() is a hot path, so we need to avoid non-
  inline function calls where possible.

* Rename __put_devmap_managed_page() to free_devmap_managed_page(),
  and limit the functionality to unconditionally freeing a devmap
  page.

This is originally based on a separate patch by Ira Weiny, which
applied to an early version of the put_user_page() experiments.
Since then, Jérôme Glisse suggested the refactoring described above.

Cc: Christoph Hellwig 
Cc: Kirill A. Shutemov 
Suggested-by: Jérôme Glisse 
Reviewed-by: Dan Williams 
Reviewed-by: Jan Kara 
Signed-off-by: Ira Weiny 
Signed-off-by: John Hubbard 
---
 include/linux/mm.h | 18 +-
 mm/memremap.c  | 15 +--
 mm/swap.c  | 27 ++-
 3 files changed, 40 insertions(+), 20 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 80a9162b406c..e2032ff640eb 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -952,9 +952,10 @@ static inline bool is_zone_device_page(const struct page 
*page)
 #endif
 
 #ifdef CONFIG_DEV_PAGEMAP_OPS
-void __put_devmap_managed_page(struct page *page);
+void free_devmap_managed_page(struct page *page);
 DECLARE_STATIC_KEY_FALSE(devmap_managed_key);
-static inline bool put_devmap_managed_page(struct page *page)
+
+static inline bool page_is_devmap_managed(struct page *page)
 {
if (!static_branch_unlikely(_managed_key))
return false;
@@ -963,7 +964,6 @@ static inline bool put_devmap_managed_page(struct page 
*page)
switch (page->pgmap->type) {
case MEMORY_DEVICE_PRIVATE:
case MEMORY_DEVICE_FS_DAX:
-   __put_devmap_managed_page(page);
return true;
default:
break;
@@ -971,11 +971,17 @@ static inline bool put_devmap_managed_page(struct page 
*page)
return false;
 }
 
+void put_devmap_managed_page(struct page *page);
+
 #else /* CONFIG_DEV_PAGEMAP_OPS */
-static inline bool put_devmap_managed_page(struct page *page)
+static inline bool page_is_devmap_managed(struct page *page)
 {
return false;
 }
+
+static inline void put_devmap_managed_page(struct page *page)
+{
+}
 #endif /* CONFIG_DEV_PAGEMAP_OPS */
 
 static inline bool is_device_private_page(const struct page *page)
@@ -1028,8 +1034,10 @@ static inline void put_page(struct page *page)
 * need to inform the device driver through callback. See
 * include/linux/memremap.h and HMM for details.
 */
-   if (put_devmap_managed_page(page))
+   if (page_is_devmap_managed(page)) {
+   put_devmap_managed_page(page);
return;
+   }
 
if (put_page_testzero(page))
__put_page(page);
diff --git a/mm/memremap.c b/mm/memremap.c
index f915d074ac20..4c723d2049d5 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -411,20 +411,8 @@ struct dev_pagemap *get_dev_pagemap(unsigned long pfn,
 EXPORT_SYMBOL_GPL(get_dev_pagemap);
 
 #ifdef CONFIG_DEV_PAGEMAP_OPS
-void __put_devmap_managed_page(struct page *page)
+void free_devmap_managed_page(struct page *page)
 {
-   int count = page_ref_dec_return(page);
-
-   /* still busy */
-   if (count > 1)
-   return;
-
-   /* only triggered by the dev_pagemap shutdown path */
-   if (count == 0) {
-   __put_page(page);
-   return;
-   }
-
/* notify page idle for dax */
if (!is_device_private_page(page)) {
wake_up_var(>_refcount);
@@ -461,5 +449,4 @@ void __put_devmap_managed_page(struct page *page)
page->mapping = NULL;
page->pgmap->ops->page_free(page);
 }
-EXPORT_SYMBOL(__put_devmap_managed_page);
 #endif /* CONFIG_DEV_PAGEMAP_OPS */
diff --git a/mm/swap.c b/mm/swap.c
index 5341ae93861f..cf39d24ada2a 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -813,8 +813,10 @@ void release_pages(struct page **pages, int nr)
 * processing, and instead, expect a call to
 * put_page_testzero().
 */
-   if (put_devmap_managed_page(page))
+   if (page_is_devmap_managed(page)) {
+   put_devmap_managed_page(page);
continue;
+   }
}
 
page = compound_head(page);
@@

[PATCH v12 01/22] mm/gup: factor out duplicate code from four routines

2020-01-07 Thread John Hubbard

There are four locations in gup.c that have a fair amount of code
duplication. This means that changing one requires making the same
changes in four places, not to mention reading the same code four
times, and wondering if there are subtle differences.

Factor out the common code into static functions, thus reducing the
overall line count and the code's complexity.

Also, take the opportunity to slightly improve the efficiency of the
error cases, by doing a mass subtraction of the refcount, surrounded
by get_page()/put_page().

Also, further simplify (slightly), by waiting until the the successful
end of each routine, to increment *nr.

Reviewed-by: Christoph Hellwig 
Reviewed-by: Jérôme Glisse 
Reviewed-by: Jan Kara 
Cc: Kirill A. Shutemov 
Cc: Ira Weiny 
Cc: Christoph Hellwig 
Cc: Aneesh Kumar K.V 
Signed-off-by: John Hubbard 
---
 mm/gup.c | 95 
 1 file changed, 40 insertions(+), 55 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index 7646bf993b25..d56c6d6b85d3 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1978,6 +1978,29 @@ static int __gup_device_huge_pud(pud_t pud, pud_t *pudp, 
unsigned long addr,
 }
 #endif
 
+static int record_subpages(struct page *page, unsigned long addr,
+  unsigned long end, struct page **pages)
+{
+   int nr;
+
+   for (nr = 0; addr != end; addr += PAGE_SIZE)
+   pages[nr++] = page++;
+
+   return nr;
+}
+
+static void put_compound_head(struct page *page, int refs)
+{
+   VM_BUG_ON_PAGE(page_ref_count(page) < refs, page);
+   /*
+* Calling put_page() for each ref is unnecessarily slow. Only the last
+* ref needs a put_page().
+*/
+   if (refs > 1)
+   page_ref_sub(page, refs - 1);
+   put_page(page);
+}
+
 #ifdef CONFIG_ARCH_HAS_HUGEPD
 static unsigned long hugepte_addr_end(unsigned long addr, unsigned long end,
  unsigned long sz)
@@ -2007,32 +2030,20 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, 
unsigned long addr,
/* hugepages are never "special" */
VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
 
-   refs = 0;
head = pte_page(pte);
-
page = head + ((addr & (sz-1)) >> PAGE_SHIFT);
-   do {
-   VM_BUG_ON(compound_head(page) != head);
-   pages[*nr] = page;
-   (*nr)++;
-   page++;
-   refs++;
-   } while (addr += PAGE_SIZE, addr != end);
+   refs = record_subpages(page, addr, end, pages + *nr);
 
head = try_get_compound_head(head, refs);
-   if (!head) {
-   *nr -= refs;
+   if (!head)
return 0;
-   }
 
if (unlikely(pte_val(pte) != pte_val(*ptep))) {
-   /* Could be optimized better */
-   *nr -= refs;
-   while (refs--)
-   put_page(head);
+   put_compound_head(head, refs);
return 0;
}
 
+   *nr += refs;
SetPageReferenced(head);
return 1;
 }
@@ -2079,28 +2090,19 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, 
unsigned long addr,
return __gup_device_huge_pmd(orig, pmdp, addr, end, pages, nr);
}
 
-   refs = 0;
page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
-   do {
-   pages[*nr] = page;
-   (*nr)++;
-   page++;
-   refs++;
-   } while (addr += PAGE_SIZE, addr != end);
+   refs = record_subpages(page, addr, end, pages + *nr);
 
head = try_get_compound_head(pmd_page(orig), refs);
-   if (!head) {
-   *nr -= refs;
+   if (!head)
return 0;
-   }
 
if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
-   *nr -= refs;
-   while (refs--)
-   put_page(head);
+   put_compound_head(head, refs);
return 0;
}
 
+   *nr += refs;
SetPageReferenced(head);
return 1;
 }
@@ -2120,28 +2122,19 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, 
unsigned long addr,
return __gup_device_huge_pud(orig, pudp, addr, end, pages, nr);
}
 
-   refs = 0;
page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
-   do {
-   pages[*nr] = page;
-   (*nr)++;
-   page++;
-   refs++;
-   } while (addr += PAGE_SIZE, addr != end);
+   refs = record_subpages(page, addr, end, pages + *nr);
 
head = try_get_compound_head(pud_page(orig), refs);
-   if (!head) {
-   *nr -= refs;
+   if (!head)
return 0;
-   }
 
if (unlikely(pud_val(orig) != pud_val(*pudp))) {
-   *nr -= refs;
-   while (refs--)
-   put_page(head);
+   put_compound_head(head, refs);
return 0;
}
 
+

[PATCH v12 03/22] mm: Cleanup __put_devmap_managed_page() vs ->page_free()

2020-01-07 Thread John Hubbard

From: Dan Williams 

After the removal of the device-public infrastructure there are only 2
->page_free() call backs in the kernel. One of those is a device-private
callback in the nouveau driver, the other is a generic wakeup needed in
the DAX case. In the hopes that all ->page_free() callbacks can be
migrated to common core kernel functionality, move the device-private
specific actions in __put_devmap_managed_page() under the
is_device_private_page() conditional, including the ->page_free()
callback. For the other page types just open-code the generic wakeup.

Yes, the wakeup is only needed in the MEMORY_DEVICE_FSDAX case, but it
does no harm in the MEMORY_DEVICE_DEVDAX and MEMORY_DEVICE_PCI_P2PDMA
case.

Reviewed-by: Christoph Hellwig 
Reviewed-by: Jérôme Glisse 
Cc: Jan Kara 
Cc: Ira Weiny 
Signed-off-by: Dan Williams 
Signed-off-by: John Hubbard 
---
 drivers/nvdimm/pmem.c |  6 
 mm/memremap.c | 80 ---
 2 files changed, 44 insertions(+), 42 deletions(-)

diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index ad8e4df1282b..4eae441f86c9 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -337,13 +337,7 @@ static void pmem_release_disk(void *__pmem)
put_disk(pmem->disk);
 }
 
-static void pmem_pagemap_page_free(struct page *page)
-{
-   wake_up_var(>_refcount);
-}
-
 static const struct dev_pagemap_ops fsdax_pagemap_ops = {
-   .page_free  = pmem_pagemap_page_free,
.kill   = pmem_pagemap_kill,
.cleanup= pmem_pagemap_cleanup,
 };
diff --git a/mm/memremap.c b/mm/memremap.c
index c51c6bd2fe34..f915d074ac20 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -27,7 +27,8 @@ static void devmap_managed_enable_put(void)
 
 static int devmap_managed_enable_get(struct dev_pagemap *pgmap)
 {
-   if (!pgmap->ops || !pgmap->ops->page_free) {
+   if (pgmap->type == MEMORY_DEVICE_PRIVATE &&
+   (!pgmap->ops || !pgmap->ops->page_free)) {
WARN(1, "Missing page_free method\n");
return -EINVAL;
}
@@ -414,44 +415,51 @@ void __put_devmap_managed_page(struct page *page)
 {
int count = page_ref_dec_return(page);
 
-   /*
-* If refcount is 1 then page is freed and refcount is stable as nobody
-* holds a reference on the page.
-*/
-   if (count == 1) {
-   /* Clear Active bit in case of parallel mark_page_accessed */
-   __ClearPageActive(page);
-   __ClearPageWaiters(page);
+   /* still busy */
+   if (count > 1)
+   return;
 
-   mem_cgroup_uncharge(page);
+   /* only triggered by the dev_pagemap shutdown path */
+   if (count == 0) {
+   __put_page(page);
+   return;
+   }
 
-   /*
-* When a device_private page is freed, the page->mapping field
-* may still contain a (stale) mapping value. For example, the
-* lower bits of page->mapping may still identify the page as
-* an anonymous page. Ultimately, this entire field is just
-* stale and wrong, and it will cause errors if not cleared.
-* One example is:
-*
-*  migrate_vma_pages()
-*migrate_vma_insert_page()
-*  page_add_new_anon_rmap()
-*__page_set_anon_rmap()
-*  ...checks page->mapping, via PageAnon(page) call,
-*and incorrectly concludes that the page is an
-*anonymous page. Therefore, it incorrectly,
-*silently fails to set up the new anon rmap.
-*
-* For other types of ZONE_DEVICE pages, migration is either
-* handled differently or not done at all, so there is no need
-* to clear page->mapping.
-*/
-   if (is_device_private_page(page))
-   page->mapping = NULL;
+   /* notify page idle for dax */
+   if (!is_device_private_page(page)) {
+   wake_up_var(>_refcount);
+   return;
+   }
 
-   page->pgmap->ops->page_free(page);
-   } else if (!count)
-   __put_page(page);
+   /* Clear Active bit in case of parallel mark_page_accessed */
+   __ClearPageActive(page);
+   __ClearPageWaiters(page);
+
+   mem_cgroup_uncharge(page);
+
+   /*
+* When a device_private page is freed, the page->mapping field
+* may still contain a (stale) mapping value. For example, the
+* lower bits of page->mapping may still identify the page as an
+* anonymous page. Ultimately, this entire field is just stale
+* and wrong, and it will cause errors if not cleared.  One
+* example is:
+*
+*

[PATCH v12 12/22] goldish_pipe: convert to pin_user_pages() and put_user_page()

2020-01-07 Thread John Hubbard

1. Call the new global pin_user_pages_fast(), from pin_goldfish_pages().

2. As required by pin_user_pages(), release these pages via
put_user_page(). In this case, do so via put_user_pages_dirty_lock().

That has the side effect of calling set_page_dirty_lock(), instead
of set_page_dirty(). This is probably more accurate.

As Christoph Hellwig put it, "set_page_dirty() is only safe if we are
dealing with a file backed page where we have reference on the inode it
hangs off." [1]

Another side effect is that the release code is simplified because
the page[] loop is now in gup.c instead of here, so just delete the
local release_user_pages() entirely, and call
put_user_pages_dirty_lock() directly, instead.

[1] https://lore.kernel.org/r/20190723153640.gb...@lst.de

Reviewed-by: Jan Kara 
Reviewed-by: Ira Weiny 
Signed-off-by: John Hubbard 
---
 drivers/platform/goldfish/goldfish_pipe.c | 17 +++--
 1 file changed, 3 insertions(+), 14 deletions(-)

diff --git a/drivers/platform/goldfish/goldfish_pipe.c 
b/drivers/platform/goldfish/goldfish_pipe.c
index ef50c264db71..2a5901efecde 100644
--- a/drivers/platform/goldfish/goldfish_pipe.c
+++ b/drivers/platform/goldfish/goldfish_pipe.c
@@ -274,7 +274,7 @@ static int goldfish_pin_pages(unsigned long first_page,
*iter_last_page_size = last_page_size;
}
 
-   ret = get_user_pages_fast(first_page, requested_pages,
+   ret = pin_user_pages_fast(first_page, requested_pages,
  !is_write ? FOLL_WRITE : 0,
  pages);
if (ret <= 0)
@@ -285,18 +285,6 @@ static int goldfish_pin_pages(unsigned long first_page,
return ret;
 }
 
-static void release_user_pages(struct page **pages, int pages_count,
-  int is_write, s32 consumed_size)
-{
-   int i;
-
-   for (i = 0; i < pages_count; i++) {
-   if (!is_write && consumed_size > 0)
-   set_page_dirty(pages[i]);
-   put_page(pages[i]);
-   }
-}
-
 /* Populate the call parameters, merging adjacent pages together */
 static void populate_rw_params(struct page **pages,
   int pages_count,
@@ -372,7 +360,8 @@ static int transfer_max_buffers(struct goldfish_pipe *pipe,
 
*consumed_size = pipe->command_buffer->rw_params.consumed_size;
 
-   release_user_pages(pipe->pages, pages_count, is_write, *consumed_size);
+   put_user_pages_dirty_lock(pipe->pages, pages_count,
+ !is_write && *consumed_size > 0);
 
mutex_unlock(>lock);
return 0;
-- 
2.24.1

[PATCH v12 10/22] media/v4l2-core: set pages dirty upon releasing DMA buffers

2020-01-07 Thread John Hubbard

After DMA is complete, and the device and CPU caches are synchronized,
it's still required to mark the CPU pages as dirty, if the data was
coming from the device. However, this driver was just issuing a
bare put_page() call, without any set_page_dirty*() call.

Fix the problem, by calling set_page_dirty_lock() if the CPU pages
were potentially receiving data from the device.

Reviewed-by: Christoph Hellwig 
Acked-by: Hans Verkuil 
Cc: Mauro Carvalho Chehab 
Cc: 
Signed-off-by: John Hubbard 
---
 drivers/media/v4l2-core/videobuf-dma-sg.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/media/v4l2-core/videobuf-dma-sg.c 
b/drivers/media/v4l2-core/videobuf-dma-sg.c
index 66a6c6c236a7..28262190c3ab 100644
--- a/drivers/media/v4l2-core/videobuf-dma-sg.c
+++ b/drivers/media/v4l2-core/videobuf-dma-sg.c
@@ -349,8 +349,11 @@ int videobuf_dma_free(struct videobuf_dmabuf *dma)
BUG_ON(dma->sglen);
 
if (dma->pages) {
-   for (i = 0; i < dma->nr_pages; i++)
+   for (i = 0; i < dma->nr_pages; i++) {
+   if (dma->direction == DMA_FROM_DEVICE)
+   set_page_dirty_lock(dma->pages[i]);
put_page(dma->pages[i]);
+   }
kfree(dma->pages);
dma->pages = NULL;
}
-- 
2.24.1

[PATCH v12 02/22] mm/gup: move try_get_compound_head() to top, fix minor issues

2020-01-07 Thread John Hubbard

An upcoming patch uses try_get_compound_head() more widely,
so move it to the top of gup.c.

Also fix a tiny spelling error and a checkpatch.pl warning.

Reviewed-by: Christoph Hellwig 
Reviewed-by: Jan Kara 
Reviewed-by: Ira Weiny 
Signed-off-by: John Hubbard 
---
 mm/gup.c | 29 +++--
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/mm/gup.c b/mm/gup.c
index d56c6d6b85d3..5938e29a5a8b 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -29,6 +29,21 @@ struct follow_page_context {
unsigned int page_mask;
 };
 
+/*
+ * Return the compound head page with ref appropriately incremented,
+ * or NULL if that failed.
+ */
+static inline struct page *try_get_compound_head(struct page *page, int refs)
+{
+   struct page *head = compound_head(page);
+
+   if (WARN_ON_ONCE(page_ref_count(head) < 0))
+   return NULL;
+   if (unlikely(!page_cache_add_speculative(head, refs)))
+   return NULL;
+   return head;
+}
+
 /**
  * put_user_pages_dirty_lock() - release and optionally dirty gup-pinned pages
  * @pages:  array of pages to be maybe marked dirty, and definitely released.
@@ -1807,20 +1822,6 @@ static void __maybe_unused undo_dev_pagemap(int *nr, int 
nr_start,
}
 }
 
-/*
- * Return the compund head page with ref appropriately incremented,
- * or NULL if that failed.
- */
-static inline struct page *try_get_compound_head(struct page *page, int refs)
-{
-   struct page *head = compound_head(page);
-   if (WARN_ON_ONCE(page_ref_count(head) < 0))
-   return NULL;
-   if (unlikely(!page_cache_add_speculative(head, refs)))
-   return NULL;
-   return head;
-}
-
 #ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
 static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end,
 unsigned int flags, struct page **pages, int *nr)
-- 
2.24.1

[PATCH v12 20/22] powerpc: book3s64: convert to pin_user_pages() and put_user_page()

2020-01-07 Thread John Hubbard

1. Convert from get_user_pages() to pin_user_pages().

2. As required by pin_user_pages(), release these pages via
put_user_page().

Reviewed-by: Jan Kara 
Signed-off-by: John Hubbard 
---
 arch/powerpc/mm/book3s64/iommu_api.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/iommu_api.c 
b/arch/powerpc/mm/book3s64/iommu_api.c
index 56cc84520577..a86547822034 100644
--- a/arch/powerpc/mm/book3s64/iommu_api.c
+++ b/arch/powerpc/mm/book3s64/iommu_api.c
@@ -103,7 +103,7 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, 
unsigned long ua,
for (entry = 0; entry < entries; entry += chunk) {
unsigned long n = min(entries - entry, chunk);
 
-   ret = get_user_pages(ua + (entry << PAGE_SHIFT), n,
+   ret = pin_user_pages(ua + (entry << PAGE_SHIFT), n,
FOLL_WRITE | FOLL_LONGTERM,
mem->hpages + entry, NULL);
if (ret == n) {
@@ -167,9 +167,8 @@ static long mm_iommu_do_alloc(struct mm_struct *mm, 
unsigned long ua,
return 0;
 
 free_exit:
-   /* free the reference taken */
-   for (i = 0; i < pinned; i++)
-   put_page(mem->hpages[i]);
+   /* free the references taken */
+   put_user_pages(mem->hpages, pinned);
 
vfree(mem->hpas);
kfree(mem);
@@ -215,7 +214,8 @@ static void mm_iommu_unpin(struct 
mm_iommu_table_group_mem_t *mem)
if (mem->hpas[i] & MM_IOMMU_TABLE_GROUP_PAGE_DIRTY)
SetPageDirty(page);
 
-   put_page(page);
+   put_user_page(page);
+
mem->hpas[i] = 0;
}
 }
-- 
2.24.1

[PATCH v12 19/22] vfio, mm: pin_user_pages (FOLL_PIN) and put_user_page() conversion

2020-01-07 Thread John Hubbard

1. Change vfio from get_user_pages_remote(), to
pin_user_pages_remote().

2. Because all FOLL_PIN-acquired pages must be released via
put_user_page(), also convert the put_page() call over to
put_user_pages_dirty_lock().

Note that this effectively changes the code's behavior in
vfio_iommu_type1.c: put_pfn(): it now ultimately calls
set_page_dirty_lock(), instead of set_page_dirty(). This is
probably more accurate.

As Christoph Hellwig put it, "set_page_dirty() is only safe if we are
dealing with a file backed page where we have reference on the inode it
hangs off." [1]

[1] https://lore.kernel.org/r/20190723153640.gb...@lst.de

Tested-by: Alex Williamson 
Acked-by: Alex Williamson 
Signed-off-by: John Hubbard 
---
 drivers/vfio/vfio_iommu_type1.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index b800fc9a0251..18bfc2fc8e6d 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -309,9 +309,8 @@ static int put_pfn(unsigned long pfn, int prot)
 {
if (!is_invalid_reserved_pfn(pfn)) {
struct page *page = pfn_to_page(pfn);
-   if (prot & IOMMU_WRITE)
-   SetPageDirty(page);
-   put_page(page);
+
+   put_user_pages_dirty_lock(, 1, prot & IOMMU_WRITE);
return 1;
}
return 0;
@@ -329,7 +328,7 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned 
long vaddr,
flags |= FOLL_WRITE;
 
down_read(>mmap_sem);
-   ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM,
+   ret = pin_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM,
page, NULL, NULL);
if (ret == 1) {
*pfn = page_to_pfn(page[0]);
-- 
2.24.1

[PATCH v12 15/22] drm/via: set FOLL_PIN via pin_user_pages_fast()

2020-01-07 Thread John Hubbard

Convert drm/via to use the new pin_user_pages_fast() call, which sets
FOLL_PIN. Setting FOLL_PIN is now required for code that requires
tracking of pinned pages, and therefore for any code that calls
put_user_page().

In partial anticipation of this work, the drm/via driver was already
calling put_user_page() instead of put_page(). Therefore, in order to
convert from the get_user_pages()/put_page() model, to the
pin_user_pages()/put_user_page() model, the only change required
is to change get_user_pages() to pin_user_pages().

Acked-by: Daniel Vetter 
Reviewed-by: Jérôme Glisse 
Reviewed-by: Ira Weiny 
Signed-off-by: John Hubbard 
---
 drivers/gpu/drm/via/via_dmablit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/via/via_dmablit.c 
b/drivers/gpu/drm/via/via_dmablit.c
index 3db000aacd26..37c5e572993a 100644
--- a/drivers/gpu/drm/via/via_dmablit.c
+++ b/drivers/gpu/drm/via/via_dmablit.c
@@ -239,7 +239,7 @@ via_lock_all_dma_pages(drm_via_sg_info_t *vsg,  
drm_via_dmablit_t *xfer)
vsg->pages = vzalloc(array_size(sizeof(struct page *), vsg->num_pages));
if (NULL == vsg->pages)
return -ENOMEM;
-   ret = get_user_pages_fast((unsigned long)xfer->mem_addr,
+   ret = pin_user_pages_fast((unsigned long)xfer->mem_addr,
vsg->num_pages,
vsg->direction == DMA_FROM_DEVICE ? FOLL_WRITE : 0,
vsg->pages);
-- 
2.24.1

[PATCH v12 21/22] mm/gup_benchmark: use proper FOLL_WRITE flags instead of hard-coding "1"

2020-01-07 Thread John Hubbard

Fix the gup benchmark flags to use the symbolic FOLL_WRITE,
instead of a hard-coded "1" value.

Also, clean up the filtering of gup flags a little, by just doing
it once before issuing any of the get_user_pages*() calls. This
makes it harder to overlook, instead of having little "gup_flags & 1"
phrases in the function calls.

Reviewed-by: Ira Weiny 
Signed-off-by: John Hubbard 
---
 mm/gup_benchmark.c | 9 ++---
 tools/testing/selftests/vm/gup_benchmark.c | 6 +-
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/mm/gup_benchmark.c b/mm/gup_benchmark.c
index ad9d5b1c4473..8dba38e79a9f 100644
--- a/mm/gup_benchmark.c
+++ b/mm/gup_benchmark.c
@@ -49,18 +49,21 @@ static int __gup_benchmark_ioctl(unsigned int cmd,
nr = (next - addr) / PAGE_SIZE;
}
 
+   /* Filter out most gup flags: only allow a tiny subset here: */
+   gup->flags &= FOLL_WRITE;
+
switch (cmd) {
case GUP_FAST_BENCHMARK:
-   nr = get_user_pages_fast(addr, nr, gup->flags & 1,
+   nr = get_user_pages_fast(addr, nr, gup->flags,
 pages + i);
break;
case GUP_LONGTERM_BENCHMARK:
nr = get_user_pages(addr, nr,
-   (gup->flags & 1) | FOLL_LONGTERM,
+   gup->flags | FOLL_LONGTERM,
pages + i, NULL);
break;
case GUP_BENCHMARK:
-   nr = get_user_pages(addr, nr, gup->flags & 1, pages + i,
+   nr = get_user_pages(addr, nr, gup->flags, pages + i,
NULL);
break;
default:
diff --git a/tools/testing/selftests/vm/gup_benchmark.c 
b/tools/testing/selftests/vm/gup_benchmark.c
index 485cf06ef013..389327e9b30a 100644
--- a/tools/testing/selftests/vm/gup_benchmark.c
+++ b/tools/testing/selftests/vm/gup_benchmark.c
@@ -18,6 +18,9 @@
 #define GUP_LONGTERM_BENCHMARK _IOWR('g', 2, struct gup_benchmark)
 #define GUP_BENCHMARK  _IOWR('g', 3, struct gup_benchmark)
 
+/* Just the flags we need, copied from mm.h: */
+#define FOLL_WRITE 0x01/* check pte is writable */
+
 struct gup_benchmark {
__u64 get_delta_usec;
__u64 put_delta_usec;
@@ -85,7 +88,8 @@ int main(int argc, char **argv)
}
 
gup.nr_pages_per_call = nr_pages;
-   gup.flags = write;
+   if (write)
+   gup.flags |= FOLL_WRITE;
 
fd = open("/sys/kernel/debug/gup_benchmark", O_RDWR);
if (fd == -1)
-- 
2.24.1

[PATCH v12 05/22] goldish_pipe: rename local pin_user_pages() routine

2020-01-07 Thread John Hubbard

1. Avoid naming conflicts: rename local static function from
"pin_user_pages()" to "goldfish_pin_pages()".

An upcoming patch will introduce a global pin_user_pages()
function.

Reviewed-by: Jan Kara 
Reviewed-by: Jérôme Glisse 
Reviewed-by: Ira Weiny 
Signed-off-by: John Hubbard 
---
 drivers/platform/goldfish/goldfish_pipe.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/platform/goldfish/goldfish_pipe.c 
b/drivers/platform/goldfish/goldfish_pipe.c
index cef0133aa47a..ef50c264db71 100644
--- a/drivers/platform/goldfish/goldfish_pipe.c
+++ b/drivers/platform/goldfish/goldfish_pipe.c
@@ -257,12 +257,12 @@ static int goldfish_pipe_error_convert(int status)
}
 }
 
-static int pin_user_pages(unsigned long first_page,
- unsigned long last_page,
- unsigned int last_page_size,
- int is_write,
- struct page *pages[MAX_BUFFERS_PER_COMMAND],
- unsigned int *iter_last_page_size)
+static int goldfish_pin_pages(unsigned long first_page,
+ unsigned long last_page,
+ unsigned int last_page_size,
+ int is_write,
+ struct page *pages[MAX_BUFFERS_PER_COMMAND],
+ unsigned int *iter_last_page_size)
 {
int ret;
int requested_pages = ((last_page - first_page) >> PAGE_SHIFT) + 1;
@@ -354,9 +354,9 @@ static int transfer_max_buffers(struct goldfish_pipe *pipe,
if (mutex_lock_interruptible(>lock))
return -ERESTARTSYS;
 
-   pages_count = pin_user_pages(first_page, last_page,
-last_page_size, is_write,
-pipe->pages, _last_page_size);
+   pages_count = goldfish_pin_pages(first_page, last_page,
+last_page_size, is_write,
+pipe->pages, _last_page_size);
if (pages_count < 0) {
mutex_unlock(>lock);
return pages_count;
-- 
2.24.1

[PATCH v12 00/22] mm/gup: prereqs to track dma-pinned pages: FOLL_PIN

2020-01-07 Thread John Hubbard

Hi,

The "track FOLL_PIN pages" would have been the very next patch, but it is
not included here because I'm still debugging a bug report from Leon.
Let's get all of the prerequisite work (it's been reviewed) into the tree
so that future reviews are easier. It's clear that any fixes that are
required to the tracking patch, won't affect these patches here.

This implements an API naming change (put_user_page*() -->
unpin_user_page*()), and also adds FOLL_PIN page support, up to
*but not including* actually tracking FOLL_PIN pages. It extends
the FOLL_PIN support to a few select subsystems. More subsystems will
be added in follow up work.

Christoph Hellwig, a point of interest:

a) I've moved the bulk of the code out of the inline functions, as
   requested, for the devmap changes (patch 4: "mm: devmap: refactor
   1-based refcounting for ZONE_DEVICE pages").

Changes since v11: Fixes resulting from Kirill Shutemov's review, plus
a fix for a kbuild robot-reported warning.

* Only include the first 22 patches: up to, but not including, the "track
  FOLL_PIN pages" patch.

* Improved the efficiency of put_compound_head(), by avoiding get_page()
  entirely, and instead doing the mass subtraction on one less than
  refs, followed by a final put_page().

* Got rid of the forward declaration of __gup_longterm_locked(), by
  moving get_user_pages_remote() further down in gup.c

* Got rid of a redundant page_is_devmap_managed() call, and simplified
  put_devmap_managed_page() as part of that small cleanup.

* Changed put_devmap_managed_page() to do an early out if the page is
  not devmap managed. This saves an indentation level.

* Applied the same type of change to __unpin_devmap_managed_user_page(),
  which has the same checks.

* Changed release_pages() to handle the changed put_devmap_managed_page()
  API.

* Removed EXPORT_SYMBOL(free_devmap_managed_page), as it is not required,
  after the other refactoring.

* Fixed a kbuild robot sparse warning: added "static" to
  try_pin_compound_head()'s declaration.

There is a git repo and branch, for convenience:

g...@github.com:johnhubbard/linux.git pin_user_pages_tracking_v8

For the remaining list of "changes since version N", those are all in
v11, which is here:

  https://lore.kernel.org/r/20191216222537.491123-1-jhubb...@nvidia.com


Overview:

This is a prerequisite to solving the problem of proper interactions
between file-backed pages, and [R]DMA activities, as discussed in [1],
[2], [3], and in a remarkable number of email threads since about
2017. :)

A new internal gup flag, FOLL_PIN is introduced, and thoroughly
documented in the last patch's Documentation/vm/pin_user_pages.rst.

I believe that this will provide a good starting point for doing the
layout lease work that Ira Weiny has been working on. That's because
these new wrapper functions provide a clean, constrained, systematically
named set of functionality that, again, is required in order to even
know if a page is "dma-pinned".

In contrast to earlier approaches, the page tracking can be
incrementally applied to the kernel call sites that, until now, have
been simply calling get_user_pages() ("gup"). In other words, opt-in by
changing from this:

get_user_pages() (sets FOLL_GET)
put_page()

to this:
pin_user_pages() (sets FOLL_PIN)
unpin_user_page()


Testing:

* I've done some overall kernel testing (LTP, and a few other goodies),
  and some directed testing to exercise some of the changes. And as you
  can see, gup_benchmark is enhanced to exercise this. Basically, I've
  been able to runtime test the core get_user_pages() and
  pin_user_pages() and related routines, but not so much on several of
  the call sites--but those are generally just a couple of lines
  changed, each.

  Not much of the kernel is actually using this, which on one hand
  reduces risk quite a lot. But on the other hand, testing coverage
  is low. So I'd love it if, in particular, the Infiniband and PowerPC
  folks could do a smoke test of this series for me.

  Runtime testing for the call sites so far is pretty light:

* io_uring: Some directed tests from liburing exercise this, and
they pass.
* process_vm_access.c: A small directed test passes.
* gup_benchmark: the enhanced version hits the new gup.c code, and
 passes.
* infiniband: Ran rdma-core tests: rdma-core/build/bin/run_tests.py
* VFIO: compiles (I'm vowing to set up a run time test soon, but it's
  not ready just yet)
* powerpc: it compiles...
* drm/via: compiles...
* goldfish: compiles...
* net/xdp: compiles...
* media/v4l2: compiles...

[1] Some slow progress on get_user_pages() (Apr 2, 2019): 
https://lwn.net/Articles/784574/
[2] DMA and get_user_pages() (LPC: Dec 12, 2018): 
https://lwn.net/Articles/774411/
[3] The trouble

Re: PPC64: G5 & 4k/64k page size

2020-01-07 Thread Bertrand


Oups. Edit :

swapon: /dev/sdb5 : pagesize doesn't fit with _swap_ space format

On 07/01/2020 16:27, Bertrand wrote:



swapon: /dev/sdb5 : pagesize doesn't fit with space space format

[PATCH v2 0/8] Allow setting caching mode in arch_add_memory() for P2PDMA

2020-01-07 Thread Logan Gunthorpe

Hi,

The main feedback from v1 was around the interface for arch_add_memory().
Per Dan's suggestions I've renamed the mhp_restrictions structure to
mhp_modifiers and put a pgprot_t field in that structure. I've also
included patch to drop the unused flags field.

Thanks,

Logan

--

Changes in v2:
 * Rebased onto v5.5-rc5
 * Renamed mhp_restrictions to mhp_modifiers and added the pgprot field
   to that structure instead of using an argument for
   arch_add_memory().
 * Add patch to drop the unused flags field in mhp_restrictions

A git branch is available here:

https://github.com/sbates130272/linux-p2pmem remap_pages_cache_v2

--

Currently, the page tables created using memremap_pages() are always
created with the PAGE_KERNEL cacheing mode. However, the P2PDMA code
is creating pages for PCI BAR memory which should never be accessed
through the cache and instead use either WC or UC. This still works in
most cases, on x86, because the MTRR registers typically override the
caching settings in the page tables for all of the IO memory to be
UC-. However, this tends not to work so well on other arches or
some rare x86 machines that have firmware which does not setup the
MTRR registers in this way.

Instead of this, this series proposes a change to arch_add_memory()
to take the pgprot required by the mapping which allows us to
explicitly set pagetable entries for P2PDMA memory to WC.

This changes is pretty routine for most of the arches: x86_64, s390, arm64
and powerpc simply need to thread the pgprot through to where the page
tables are setup. x86_32 unfortunately sets up the page tables at boot so
must use _set_memory_prot() to change their caching mode. ia64 and sh
don't appear to have an easy way to change the page tables so, for now
at least, we just return -EINVAL on such mappings and thus they will
not support P2PDMA memory until the work for this is done.

--

Logan Gunthorpe (8):
  mm/memory_hotplug: Drop the flags field from struct mhp_restrictions
  mm/memory_hotplug: Rename mhp_restrictions to mhp_modifiers
  x86/mm: Thread pgprot_t through init_memory_mapping()
  x86/mm: Introduce _set_memory_prot()
  powerpc/mm: Thread pgprot_t through create_section_mapping()
  s390/mm: Thread pgprot_t through vmem_add_mapping()
  mm/memory_hotplug: Add pgprot_t to mhp_modifiers
  mm/memremap: Set caching mode for PCI P2PDMA memory to WC

 arch/arm64/mm/mmu.c|  7 ++--
 arch/ia64/mm/init.c|  6 +++-
 arch/powerpc/include/asm/book3s/64/hash.h  |  3 +-
 arch/powerpc/include/asm/book3s/64/radix.h |  3 +-
 arch/powerpc/include/asm/sparsemem.h   |  3 +-
 arch/powerpc/mm/book3s64/hash_utils.c  |  5 +--
 arch/powerpc/mm/book3s64/pgtable.c |  7 ++--
 arch/powerpc/mm/book3s64/radix_pgtable.c   | 18 ++
 arch/powerpc/mm/mem.c  | 10 +++---
 arch/s390/include/asm/pgtable.h|  3 +-
 arch/s390/mm/extmem.c  |  3 +-
 arch/s390/mm/init.c|  8 ++---
 arch/s390/mm/vmem.c| 10 +++---
 arch/sh/mm/init.c  |  7 ++--
 arch/x86/include/asm/page_types.h  |  3 --
 arch/x86/include/asm/pgtable.h |  3 ++
 arch/x86/include/asm/set_memory.h  |  1 +
 arch/x86/kernel/amd_gart_64.c  |  3 +-
 arch/x86/mm/init.c |  9 ++---
 arch/x86/mm/init_32.c  | 12 +--
 arch/x86/mm/init_64.c  | 40 --
 arch/x86/mm/mm_internal.h  |  3 +-
 arch/x86/mm/pageattr.c |  7 
 arch/x86/platform/efi/efi_64.c |  3 +-
 include/linux/memory_hotplug.h | 16 -
 mm/memory_hotplug.c|  8 ++---
 mm/memremap.c  | 17 +
 27 files changed, 132 insertions(+), 86 deletions(-)

--
2.20.1

[PATCH v2 3/8] x86/mm: Thread pgprot_t through init_memory_mapping()

2020-01-07 Thread Logan Gunthorpe

In prepartion to support a pgprot_t argument for arch_add_memory().

It's required to move the prototype of init_memory_mapping() seeing
the original location came before the definition of pgprot_t.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org
Cc: Dave Hansen 
Cc: Andy Lutomirski 
Cc: Peter Zijlstra 
Signed-off-by: Logan Gunthorpe 
---
 arch/x86/include/asm/page_types.h |  3 ---
 arch/x86/include/asm/pgtable.h|  3 +++
 arch/x86/kernel/amd_gart_64.c |  3 ++-
 arch/x86/mm/init.c|  9 +
 arch/x86/mm/init_32.c |  3 ++-
 arch/x86/mm/init_64.c | 32 +--
 arch/x86/mm/mm_internal.h |  3 ++-
 arch/x86/platform/efi/efi_64.c|  3 ++-
 8 files changed, 34 insertions(+), 25 deletions(-)

diff --git a/arch/x86/include/asm/page_types.h 
b/arch/x86/include/asm/page_types.h
index c85e15010f48..bf7aa2e290ef 100644
--- a/arch/x86/include/asm/page_types.h
+++ b/arch/x86/include/asm/page_types.h
@@ -73,9 +73,6 @@ static inline phys_addr_t get_max_mapped(void)
 
 bool pfn_range_is_mapped(unsigned long start_pfn, unsigned long end_pfn);
 
-extern unsigned long init_memory_mapping(unsigned long start,
-unsigned long end);
-
 extern void initmem_init(void);
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index ad97dc155195..844635e02da5 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -1041,6 +1041,9 @@ static inline void __meminit init_trampoline_default(void)
 
 void __init poking_init(void);
 
+unsigned long init_memory_mapping(unsigned long start,
+ unsigned long end, pgprot_t prot);
+
 # ifdef CONFIG_RANDOMIZE_MEMORY
 void __meminit init_trampoline(void);
 # else
diff --git a/arch/x86/kernel/amd_gart_64.c b/arch/x86/kernel/amd_gart_64.c
index 4e5f50236048..16133819415c 100644
--- a/arch/x86/kernel/amd_gart_64.c
+++ b/arch/x86/kernel/amd_gart_64.c
@@ -744,7 +744,8 @@ int __init gart_iommu_init(void)
 
start_pfn = PFN_DOWN(aper_base);
if (!pfn_range_is_mapped(start_pfn, end_pfn))
-   init_memory_mapping(start_pfn<> PAGE_SHIFT, ret >> PAGE_SHIFT);
 
@@ -521,7 +522,7 @@ static unsigned long __init init_range_memory_mapping(
 */
can_use_brk_pgt = max(start, (u64)pgt_buf_end<=
min(end, (u64)pgt_buf_top<> PAGE_SHIFT,
-PAGE_KERNEL_LARGE),
+prot),
 init);
spin_unlock(_mm.page_table_lock);
paddr_last = paddr_next;
@@ -669,7 +672,7 @@ phys_pud_init(pud_t *pud_page, unsigned long paddr, 
unsigned long paddr_end,
 
 static unsigned long __meminit
 phys_p4d_init(p4d_t *p4d_page, unsigned long paddr, unsigned long paddr_end,
- unsigned long page_size_mask, bool init)
+ unsigned long page_size_mask, pgprot_t prot, bool init)
 {
unsigned long vaddr, vaddr_end, vaddr_next, paddr_next, paddr_last;
 
@@ -679,7 +682,7 @@ phys_p4d_init(p4d_t *p4d_page, unsigned long paddr, 
unsigned long paddr_end,
 
if (!pgtable_l5_enabled())
return phys_pud_init((pud_t *) p4d_page, paddr, paddr_end,
-page_size_mask, init);
+page_size_mask, prot, init);
 
for (; vaddr < vaddr_end; vaddr = vaddr_next) {
p4d_t *p4d = p4d_page + p4d_index(vaddr);
@@ -702,13 +705,13 @@ phys_p4d_init(p4d_t *p4d_page, unsigned long paddr, 
unsigned long paddr_end,
if (!p4d_none(*p4d)) {
pud = pud_offset(p4d, 0);
paddr_last = phys_pud_init(pud, paddr, __pa(vaddr_end),
-   page_size_mask, init);
+   page_size_mask, prot, init);
continue;
}
 
pud = alloc_low_page();
paddr_last = phys_pud_init(pud, paddr, __pa(vaddr_end),
-  page_size_mask, init);
+  page_size_mask, prot, init);
 
spin_lock(_mm.page_table_lock);
p4d_populate_init(_mm, p4d, pud, init);
@@ -722,7 +725,7 @@ static unsigned long __meminit
 __kernel_physical_mapping_init(unsigned long paddr_start,
   unsigned long paddr_end,
   unsigned long page_size_mask,
-  bool init)
+  pgprot_t prot, bool init)
 {
bool pgd_changed = false;
unsigned long vaddr, vaddr_start, vaddr_end, vaddr_next, paddr_last;
@@ -743,13 +746,13 @@ __kernel_physical_mapping_init(unsigned long

[PATCH v2 5/8] powerpc/mm: Thread pgprot_t through create_section_mapping()

2020-01-07 Thread Logan Gunthorpe

In prepartion to support a pgprot_t argument for arch_add_memory().

Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Michael Ellerman 
Signed-off-by: Logan Gunthorpe 
---
 arch/powerpc/include/asm/book3s/64/hash.h  |  3 ++-
 arch/powerpc/include/asm/book3s/64/radix.h |  3 ++-
 arch/powerpc/include/asm/sparsemem.h   |  3 ++-
 arch/powerpc/mm/book3s64/hash_utils.c  |  5 +++--
 arch/powerpc/mm/book3s64/pgtable.c |  7 ---
 arch/powerpc/mm/book3s64/radix_pgtable.c   | 18 +++---
 arch/powerpc/mm/mem.c  |  5 +++--
 7 files changed, 27 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/hash.h 
b/arch/powerpc/include/asm/book3s/64/hash.h
index 2781ebf6add4..6fc4520092c7 100644
--- a/arch/powerpc/include/asm/book3s/64/hash.h
+++ b/arch/powerpc/include/asm/book3s/64/hash.h
@@ -251,7 +251,8 @@ extern int __meminit hash__vmemmap_create_mapping(unsigned 
long start,
 extern void hash__vmemmap_remove_mapping(unsigned long start,
 unsigned long page_size);
 
-int hash__create_section_mapping(unsigned long start, unsigned long end, int 
nid);
+int hash__create_section_mapping(unsigned long start, unsigned long end,
+int nid, pgprot_t prot);
 int hash__remove_section_mapping(unsigned long start, unsigned long end);
 
 #endif /* !__ASSEMBLY__ */
diff --git a/arch/powerpc/include/asm/book3s/64/radix.h 
b/arch/powerpc/include/asm/book3s/64/radix.h
index d97db3ad9aae..46799f3c3d1d 100644
--- a/arch/powerpc/include/asm/book3s/64/radix.h
+++ b/arch/powerpc/include/asm/book3s/64/radix.h
@@ -289,7 +289,8 @@ static inline unsigned long radix__get_tree_size(void)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-int radix__create_section_mapping(unsigned long start, unsigned long end, int 
nid);
+int radix__create_section_mapping(unsigned long start, unsigned long end,
+ int nid, pgprot_t prot);
 int radix__remove_section_mapping(unsigned long start, unsigned long end);
 #endif /* CONFIG_MEMORY_HOTPLUG */
 #endif /* __ASSEMBLY__ */
diff --git a/arch/powerpc/include/asm/sparsemem.h 
b/arch/powerpc/include/asm/sparsemem.h
index 3192d454a733..c89b32443cff 100644
--- a/arch/powerpc/include/asm/sparsemem.h
+++ b/arch/powerpc/include/asm/sparsemem.h
@@ -13,7 +13,8 @@
 #endif /* CONFIG_SPARSEMEM */
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-extern int create_section_mapping(unsigned long start, unsigned long end, int 
nid);
+extern int create_section_mapping(unsigned long start, unsigned long end,
+ int nid, pgprot_t prot);
 extern int remove_section_mapping(unsigned long start, unsigned long end);
 
 #ifdef CONFIG_PPC_BOOK3S_64
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c 
b/arch/powerpc/mm/book3s64/hash_utils.c
index b30435c7d804..276e353d5264 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -800,7 +800,8 @@ int resize_hpt_for_hotplug(unsigned long new_mem_size)
return 0;
 }
 
-int hash__create_section_mapping(unsigned long start, unsigned long end, int 
nid)
+int hash__create_section_mapping(unsigned long start, unsigned long end,
+int nid, pgprot_t prot)
 {
int rc;
 
@@ -810,7 +811,7 @@ int hash__create_section_mapping(unsigned long start, 
unsigned long end, int nid
}
 
rc = htab_bolt_mapping(start, end, __pa(start),
-  pgprot_val(PAGE_KERNEL), mmu_linear_psize,
+  pgprot_val(prot), mmu_linear_psize,
   mmu_kernel_ssize);
 
if (rc < 0) {
diff --git a/arch/powerpc/mm/book3s64/pgtable.c 
b/arch/powerpc/mm/book3s64/pgtable.c
index 75483b40fcb1..b60c18d2e5c9 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -171,12 +171,13 @@ void mmu_cleanup_all(void)
 }
 
 #ifdef CONFIG_MEMORY_HOTPLUG
-int __meminit create_section_mapping(unsigned long start, unsigned long end, 
int nid)
+int __meminit create_section_mapping(unsigned long start, unsigned long end,
+int nid, pgprot_t prot)
 {
if (radix_enabled())
-   return radix__create_section_mapping(start, end, nid);
+   return radix__create_section_mapping(start, end, nid, prot);
 
-   return hash__create_section_mapping(start, end, nid);
+   return hash__create_section_mapping(start, end, nid, prot);
 }
 
 int __meminit remove_section_mapping(unsigned long start, unsigned long end)
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 974109bb85db..2a21fb4a22b2 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -253,7 +253,7 @@ static unsigned long next_boundary(unsigned long addr, 
unsigned long end)
 
 static int __meminit create_physical_mapping(unsigned long start,

[PATCH v2 4/8] x86/mm: Introduce _set_memory_prot()

2020-01-07 Thread Logan Gunthorpe

For use in the 32bit arch_add_memory() to set the pgprot type of the
memory to add.

Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Borislav Petkov 
Cc: "H. Peter Anvin" 
Cc: x...@kernel.org
Cc: Dave Hansen 
Cc: Andy Lutomirski 
Cc: Peter Zijlstra 
Signed-off-by: Logan Gunthorpe 
---
 arch/x86/include/asm/set_memory.h | 1 +
 arch/x86/mm/pageattr.c| 7 +++
 2 files changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/set_memory.h 
b/arch/x86/include/asm/set_memory.h
index 2ee8e469dcf5..d728c2f3ad96 100644
--- a/arch/x86/include/asm/set_memory.h
+++ b/arch/x86/include/asm/set_memory.h
@@ -34,6 +34,7 @@
  * The caller is required to take care of these.
  */
 
+int _set_memory_prot(unsigned long addr, int numpages, pgprot_t prot);
 int _set_memory_uc(unsigned long addr, int numpages);
 int _set_memory_wc(unsigned long addr, int numpages);
 int _set_memory_wt(unsigned long addr, int numpages);
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 1b99ad05b117..2f1934d7b8d9 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1781,6 +1781,13 @@ static inline int cpa_clear_pages_array(struct page 
**pages, int numpages,
CPA_PAGES_ARRAY, pages);
 }
 
+int _set_memory_prot(unsigned long addr, int numpages, pgprot_t prot)
+{
+   return change_page_attr_set_clr(, numpages, prot,
+   __pgprot(~pgprot_val(prot)), 0, 0,
+   NULL);
+}
+
 int _set_memory_uc(unsigned long addr, int numpages)
 {
/*
-- 
2.20.1

[PATCH v2 1/8] mm/memory_hotplug: Drop the flags field from struct mhp_restrictions

2020-01-07 Thread Logan Gunthorpe

This variable is not used anywhere and should therefore be removed
from the structure.

Signed-off-by: Logan Gunthorpe 
---
 include/linux/memory_hotplug.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index ba0dca6aac6e..e47a29761088 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -55,11 +55,9 @@ enum {
 
 /*
  * Restrictions for the memory hotplug:
- * flags:  MHP_ flags
  * altmap: alternative allocator for memmap array
  */
 struct mhp_restrictions {
-   unsigned long flags;
struct vmem_altmap *altmap;
 };
 
-- 
2.20.1

[PATCH v2 6/8] s390/mm: Thread pgprot_t through vmem_add_mapping()

2020-01-07 Thread Logan Gunthorpe

In prepartion to support a pgprot_t argument for arch_add_memory().

Cc: Heiko Carstens 
Cc: Vasily Gorbik 
Cc: Christian Borntraeger 
Signed-off-by: Logan Gunthorpe 
---
 arch/s390/include/asm/pgtable.h |  3 ++-
 arch/s390/mm/extmem.c   |  3 ++-
 arch/s390/mm/init.c |  2 +-
 arch/s390/mm/vmem.c | 10 +-
 4 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 7b03037a8475..e667a1a96879 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1640,7 +1640,8 @@ static inline swp_entry_t __swp_entry(unsigned long type, 
unsigned long offset)
 
 #define kern_addr_valid(addr)   (1)
 
-extern int vmem_add_mapping(unsigned long start, unsigned long size);
+extern int vmem_add_mapping(unsigned long start, unsigned long size,
+   pgprot_t prot);
 extern int vmem_remove_mapping(unsigned long start, unsigned long size);
 extern int s390_enable_sie(void);
 extern int s390_enable_skey(void);
diff --git a/arch/s390/mm/extmem.c b/arch/s390/mm/extmem.c
index fd0dae9d10f4..6cf7029a7b35 100644
--- a/arch/s390/mm/extmem.c
+++ b/arch/s390/mm/extmem.c
@@ -313,7 +313,8 @@ __segment_load (char *name, int do_nonshared, unsigned long 
*addr, unsigned long
goto out_free;
}
 
-   rc = vmem_add_mapping(seg->start_addr, seg->end - seg->start_addr + 1);
+   rc = vmem_add_mapping(seg->start_addr, seg->end - seg->start_addr + 1,
+ PAGE_KERNEL);
 
if (rc)
goto out_free;
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index a0c88c1c9ad0..ef19522ddad2 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -277,7 +277,7 @@ int arch_add_memory(int nid, u64 start, u64 size,
if (WARN_ON_ONCE(modifiers->altmap))
return -EINVAL;
 
-   rc = vmem_add_mapping(start, size);
+   rc = vmem_add_mapping(start, size, PAGE_KERNEL);
if (rc)
return rc;
 
diff --git a/arch/s390/mm/vmem.c b/arch/s390/mm/vmem.c
index b403fa14847d..8a5e95f184a2 100644
--- a/arch/s390/mm/vmem.c
+++ b/arch/s390/mm/vmem.c
@@ -66,7 +66,7 @@ pte_t __ref *vmem_pte_alloc(void)
 /*
  * Add a physical memory range to the 1:1 mapping.
  */
-static int vmem_add_mem(unsigned long start, unsigned long size)
+static int vmem_add_mem(unsigned long start, unsigned long size, pgprot_t prot)
 {
unsigned long pgt_prot, sgt_prot, r3_prot;
unsigned long pages4k, pages1m, pages2g;
@@ -79,7 +79,7 @@ static int vmem_add_mem(unsigned long start, unsigned long 
size)
pte_t *pt_dir;
int ret = -ENOMEM;
 
-   pgt_prot = pgprot_val(PAGE_KERNEL);
+   pgt_prot = pgprot_val(prot);
sgt_prot = pgprot_val(SEGMENT_KERNEL);
r3_prot = pgprot_val(REGION3_KERNEL);
if (!MACHINE_HAS_NX) {
@@ -362,7 +362,7 @@ int vmem_remove_mapping(unsigned long start, unsigned long 
size)
return ret;
 }
 
-int vmem_add_mapping(unsigned long start, unsigned long size)
+int vmem_add_mapping(unsigned long start, unsigned long size, pgprot_t prot)
 {
struct memory_segment *seg;
int ret;
@@ -379,7 +379,7 @@ int vmem_add_mapping(unsigned long start, unsigned long 
size)
if (ret)
goto out_free;
 
-   ret = vmem_add_mem(start, size);
+   ret = vmem_add_mem(start, size, prot);
if (ret)
goto out_remove;
goto out;
@@ -403,7 +403,7 @@ void __init vmem_map_init(void)
struct memblock_region *reg;
 
for_each_memblock(memory, reg)
-   vmem_add_mem(reg->base, reg->size);
+   vmem_add_mem(reg->base, reg->size, PAGE_KERNEL);
__set_memory((unsigned long)_stext,
 (unsigned long)(_etext - _stext) >> PAGE_SHIFT,
 SET_MEMORY_RO | SET_MEMORY_X);
-- 
2.20.1

[PATCH v2 2/8] mm/memory_hotplug: Rename mhp_restrictions to mhp_modifiers

2020-01-07 Thread Logan Gunthorpe

The mhp_restrictions struct really doesn't specify anything resembling
a restriction anymore so rename it to be mhp_modifiers.

Signed-off-by: Logan Gunthorpe 
---
 arch/arm64/mm/mmu.c|  4 ++--
 arch/ia64/mm/init.c|  4 ++--
 arch/powerpc/mm/mem.c  |  4 ++--
 arch/s390/mm/init.c|  6 +++---
 arch/sh/mm/init.c  |  4 ++--
 arch/x86/mm/init_32.c  |  4 ++--
 arch/x86/mm/init_64.c  |  8 
 include/linux/memory_hotplug.h | 12 ++--
 mm/memory_hotplug.c|  8 
 mm/memremap.c  |  8 
 10 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 40797cbfba2d..3320406579c3 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1050,7 +1050,7 @@ int p4d_free_pud_page(p4d_t *p4d, unsigned long addr)
 
 #ifdef CONFIG_MEMORY_HOTPLUG
 int arch_add_memory(int nid, u64 start, u64 size,
-   struct mhp_restrictions *restrictions)
+   struct mhp_modifiers *modifiers)
 {
int flags = 0;
 
@@ -1063,7 +1063,7 @@ int arch_add_memory(int nid, u64 start, u64 size,
memblock_clear_nomap(start, size);
 
return __add_pages(nid, start >> PAGE_SHIFT, size >> PAGE_SHIFT,
-  restrictions);
+  modifiers);
 }
 void arch_remove_memory(int nid, u64 start, u64 size,
struct vmem_altmap *altmap)
diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index b01d68a2d5d9..daf438e08b96 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -670,13 +670,13 @@ mem_init (void)
 
 #ifdef CONFIG_MEMORY_HOTPLUG
 int arch_add_memory(int nid, u64 start, u64 size,
-   struct mhp_restrictions *restrictions)
+   struct mhp_modifiers *modifiers)
 {
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
int ret;
 
-   ret = __add_pages(nid, start_pfn, nr_pages, restrictions);
+   ret = __add_pages(nid, start_pfn, nr_pages, modifiers);
if (ret)
printk("%s: Problem encountered in __add_pages() as ret=%d\n",
   __func__,  ret);
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index f5535eae637f..9dd9c3c1be7f 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -127,7 +127,7 @@ static void flush_dcache_range_chunked(unsigned long start, 
unsigned long stop,
 }
 
 int __ref arch_add_memory(int nid, u64 start, u64 size,
-   struct mhp_restrictions *restrictions)
+ struct mhp_modifiers *modifiers)
 {
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
@@ -143,7 +143,7 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
return -EFAULT;
}
 
-   return __add_pages(nid, start_pfn, nr_pages, restrictions);
+   return __add_pages(nid, start_pfn, nr_pages, modifiers);
 }
 
 void __ref arch_remove_memory(int nid, u64 start, u64 size,
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index ac44bd76db4b..a0c88c1c9ad0 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -268,20 +268,20 @@ device_initcall(s390_cma_mem_init);
 #endif /* CONFIG_CMA */
 
 int arch_add_memory(int nid, u64 start, u64 size,
-   struct mhp_restrictions *restrictions)
+   struct mhp_modifiers *modifiers)
 {
unsigned long start_pfn = PFN_DOWN(start);
unsigned long size_pages = PFN_DOWN(size);
int rc;
 
-   if (WARN_ON_ONCE(restrictions->altmap))
+   if (WARN_ON_ONCE(modifiers->altmap))
return -EINVAL;
 
rc = vmem_add_mapping(start, size);
if (rc)
return rc;
 
-   rc = __add_pages(nid, start_pfn, size_pages, restrictions);
+   rc = __add_pages(nid, start_pfn, size_pages, modifiers);
if (rc)
vmem_remove_mapping(start, size);
return rc;
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index d1b1ff2be17a..7e64f42fb570 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -406,14 +406,14 @@ void __init mem_init(void)
 
 #ifdef CONFIG_MEMORY_HOTPLUG
 int arch_add_memory(int nid, u64 start, u64 size,
-   struct mhp_restrictions *restrictions)
+   struct mhp_modifiers *modifiers)
 {
unsigned long start_pfn = PFN_DOWN(start);
unsigned long nr_pages = size >> PAGE_SHIFT;
int ret;
 
/* We only have ZONE_NORMAL, so this is easy.. */
-   ret = __add_pages(nid, start_pfn, nr_pages, restrictions);
+   ret = __add_pages(nid, start_pfn, nr_pages, modifiers);
if (unlikely(ret))
printk("%s: Failed, __add_pages() == %d\n", __func__, ret);
 
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 0a74407ef92e..e6fce2d547f0 100644

[PATCH v2 8/8] mm/memremap: Set caching mode for PCI P2PDMA memory to WC

2020-01-07 Thread Logan Gunthorpe

PCI BAR IO memory should never be mapped as WB, however prior to this
the PAT bits were set WB and it was typically overridden by MTRR
registers set by the firmware.

Set PCI P2PDMA memory to be WC (writecombining) as the only current
user (the NVMe CMB) was originally mapped WC before the P2PDMA code
replaced the mapping with devm_memremap_pages().

Future use-cases may need to generalize this by adding flags to
select the caching type, as some P2PDMA cases will not want WC.
However, those use-cases are not upstream yet and this can be changed
when they arrive.

Cc: Dan Williams 
Cc: Christoph Hellwig 
Cc: Jason Gunthorpe 
Signed-off-by: Logan Gunthorpe 
---
 mm/memremap.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/memremap.c b/mm/memremap.c
index 45ab4ef0643d..d36ff688b768 100644
--- a/mm/memremap.c
+++ b/mm/memremap.c
@@ -187,7 +187,10 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid)
}
break;
case MEMORY_DEVICE_DEVDAX:
+   need_devmap_managed = false;
+   break;
case MEMORY_DEVICE_PCI_P2PDMA:
+   modifiers.pgprot = pgprot_writecombine(modifiers.pgprot);
need_devmap_managed = false;
break;
default:
-- 
2.20.1

[PATCH v2 7/8] mm/memory_hotplug: Add pgprot_t to mhp_modifiers

2020-01-07 Thread Logan Gunthorpe

devm_memremap_pages() is currently used by the PCI P2PDMA code to create
struct page mappings for IO memory. At present, these mappings are created
with PAGE_KERNEL which implies setting the PAT bits to be WB. However, on
x86, an mtrr register will typically override this and force the cache
type to be UC-. In the case firmware doesn't set this register it is
effectively WB and will typically result in a machine check exception
when it's accessed.

Other arches are not currently likely to function correctly seeing they
don't have any MTRR registers to fall back on.

To solve this, add an argument to arch_add_memory() to explicitly
set the pgprot value to a specific value.

Of the arches that support MEMORY_HOTPLUG: x86_64, s390 and arm64 is a
simple change to pass the pgprot_t down to their respective functions
which set up the page tables. For x86_32, set the page tables explicitly
using _set_memory_prot() (seeing they are already mapped). For sh, reject
anything but PAGE_KERNEL settings -- this should be fine, for now, seeing
sh doesn't support ZONE_DEVICE anyway.

Cc: Dan Williams 
Cc: David Hildenbrand 
Cc: Michal Hocko 
Signed-off-by: Logan Gunthorpe 
---
 arch/arm64/mm/mmu.c| 3 ++-
 arch/ia64/mm/init.c| 4 
 arch/powerpc/mm/mem.c  | 3 ++-
 arch/s390/mm/init.c| 2 +-
 arch/sh/mm/init.c  | 3 +++
 arch/x86/mm/init_32.c  | 5 +
 arch/x86/mm/init_64.c  | 2 +-
 include/linux/memory_hotplug.h | 2 ++
 mm/memory_hotplug.c| 2 +-
 mm/memremap.c  | 6 +++---
 10 files changed, 24 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 3320406579c3..9b214b0d268f 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1058,7 +1058,8 @@ int arch_add_memory(int nid, u64 start, u64 size,
flags = NO_BLOCK_MAPPINGS | NO_CONT_MAPPINGS;
 
__create_pgd_mapping(swapper_pg_dir, start, __phys_to_virt(start),
-size, PAGE_KERNEL, __pgd_pgtable_alloc, flags);
+size, modifiers->pgprot, __pgd_pgtable_alloc,
+flags);
 
memblock_clear_nomap(start, size);
 
diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c
index daf438e08b96..5fd6ae4929c9 100644
--- a/arch/ia64/mm/init.c
+++ b/arch/ia64/mm/init.c
@@ -677,6 +677,10 @@ int arch_add_memory(int nid, u64 start, u64 size,
int ret;
 
ret = __add_pages(nid, start_pfn, nr_pages, modifiers);
+   if (modifiers->pgprot != PAGE_KERNEL)
+   return -EINVAL;
+
+   ret = __add_pages(nid, start_pfn, nr_pages, restrictions);
if (ret)
printk("%s: Problem encountered in __add_pages() as ret=%d\n",
   __func__,  ret);
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 631ee684721f..fddeaee53198 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -137,7 +137,8 @@ int __ref arch_add_memory(int nid, u64 start, u64 size,
resize_hpt_for_hotplug(memblock_phys_mem_size());
 
start = (unsigned long)__va(start);
-   rc = create_section_mapping(start, start + size, nid, PAGE_KERNEL);
+   rc = create_section_mapping(start, start + size, nid,
+   modifiers->pgprot);
if (rc) {
pr_warn("Unable to create mapping for hot added memory 
0x%llx..0x%llx: %d\n",
start, start + size, rc);
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c
index ef19522ddad2..c65fb33f6a89 100644
--- a/arch/s390/mm/init.c
+++ b/arch/s390/mm/init.c
@@ -277,7 +277,7 @@ int arch_add_memory(int nid, u64 start, u64 size,
if (WARN_ON_ONCE(modifiers->altmap))
return -EINVAL;
 
-   rc = vmem_add_mapping(start, size, PAGE_KERNEL);
+   rc = vmem_add_mapping(start, size, modifiers->pgprot);
if (rc)
return rc;
 
diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c
index 7e64f42fb570..7071dc5bd2e4 100644
--- a/arch/sh/mm/init.c
+++ b/arch/sh/mm/init.c
@@ -412,6 +412,9 @@ int arch_add_memory(int nid, u64 start, u64 size,
unsigned long nr_pages = size >> PAGE_SHIFT;
int ret;
 
+   if (modifiers->pgprot != PAGE_KERNEL)
+   return -EINVAL;
+
/* We only have ZONE_NORMAL, so this is easy.. */
ret = __add_pages(nid, start_pfn, nr_pages, modifiers);
if (unlikely(ret))
diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c
index 630d8a36fcd7..737da0dbc0d5 100644
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -857,6 +857,11 @@ int arch_add_memory(int nid, u64 start, u64 size,
 {
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
+   int ret;
+
+   ret = _set_memory_prot(start, nr_pages, modifiers->pgprot);
+   if (ret)
+   return ret;
 
return __add_pages(nid, start_pfn, nr_pages,

Re: [PATCH v6 1/6] Documentation/ABI: add ABI documentation for /sys/kernel/fadump_*

2020-01-07 Thread Hari Bathini



On 11/12/19 9:39 PM, Sourabh Jain wrote:
> Add missing ABI documentation for existing FADump sysfs files.
> 
> Signed-off-by: Sourabh Jain 

The patch series adds a new sysfs attribute that provides the amount of memory
reserved for FADump. It also streamlines FADump specific sysfs interfaces.
Thanks for looking into this.

For the series...

Reviewed-by: Hari Bathini 

> ---
>  Documentation/ABI/testing/sysfs-kernel-fadump_enabled | 7 +++
>  Documentation/ABI/testing/sysfs-kernel-fadump_registered  | 8 
>  Documentation/ABI/testing/sysfs-kernel-fadump_release_mem | 8 
>  .../ABI/testing/sysfs-kernel-fadump_release_opalcore  | 7 +++
>  4 files changed, 30 insertions(+)
>  create mode 100644 Documentation/ABI/testing/sysfs-kernel-fadump_enabled
>  create mode 100644 Documentation/ABI/testing/sysfs-kernel-fadump_registered
>  create mode 100644 Documentation/ABI/testing/sysfs-kernel-fadump_release_mem
>  create mode 100644 
> Documentation/ABI/testing/sysfs-kernel-fadump_release_opalcore
> 
> diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump_enabled 
> b/Documentation/ABI/testing/sysfs-kernel-fadump_enabled
> new file mode 100644
> index ..f73632b1c006
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-kernel-fadump_enabled
> @@ -0,0 +1,7 @@
> +What:/sys/kernel/fadump_enabled
> +Date:Feb 2012
> +Contact: linuxppc-dev@lists.ozlabs.org
> +Description: read only
> + Primarily used to identify whether the FADump is enabled in
> + the kernel or not.
> +User:Kdump service
> diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump_registered 
> b/Documentation/ABI/testing/sysfs-kernel-fadump_registered
> new file mode 100644
> index ..dcf925e53f0f
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-kernel-fadump_registered
> @@ -0,0 +1,8 @@
> +What:/sys/kernel/fadump_registered
> +Date:Feb 2012
> +Contact: linuxppc-dev@lists.ozlabs.org
> +Description: read/write
> + Helps to control the dump collect feature from userspace.
> + Setting 1 to this file enables the system to collect the
> + dump and 0 to disable it.
> +User:Kdump service
> diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump_release_mem 
> b/Documentation/ABI/testing/sysfs-kernel-fadump_release_mem
> new file mode 100644
> index ..9c20d64ab48d
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-kernel-fadump_release_mem
> @@ -0,0 +1,8 @@
> +What:/sys/kernel/fadump_release_mem
> +Date:Feb 2012
> +Contact: linuxppc-dev@lists.ozlabs.org
> +Description: write only
> + This is a special sysfs file and only available when
> + the system is booted to capture the vmcore using FADump.
> + It is used to release the memory reserved by FADump to
> + save the crash dump.
> diff --git a/Documentation/ABI/testing/sysfs-kernel-fadump_release_opalcore 
> b/Documentation/ABI/testing/sysfs-kernel-fadump_release_opalcore
> new file mode 100644
> index ..53313c1d4e7a
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-kernel-fadump_release_opalcore
> @@ -0,0 +1,7 @@
> +What:/sys/kernel/fadump_release_opalcore
> +Date:Sep 2019
> +Contact: linuxppc-dev@lists.ozlabs.org
> +Description: write only
> + The sysfs file is available when the system is booted to
> + collect the dump on OPAL based machine. It used to release
> + the memory used to collect the opalcore.
> 

-- 
- Hari

Patch for '5.3.7 64-bits kernel doesn't boot on G5 Quad' found (was: Re: PPC64: G5 & 4k/64k page size)

2020-01-07 Thread Romain Dolbeau

Hello all,

Great news: Aneesh sent me a patch that solves the problem on my G5 Quad :-)

I don't know whether he considers it a 'workaround' or if it's the
'proper' patch for upstream, so beware. It's a one-liner so I attach
it to this message: those affected can test it as well to confirm if
that indeed solves the problem for them as well.

If it is the 'proper' patch, I expect it should get into upstream
pretty quickly. Then Debian should be able to trivially backport it to
their PPC64 kernel, and the G5 will still be a great machine in 2020!

Thanks everyone for your help in tracking down the bug & to Aneesh for
finding a fix :-)

Cordially,

-- 
Romain Dolbeau
diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 15b75005bc34..516db8a2e6ca 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -601,7 +601,7 @@ extern void slb_set_size(u16 size);
  */
 #define MAX_USER_CONTEXT	((ASM_CONST(1) << CONTEXT_BITS) - 2)
 #define MIN_USER_CONTEXT	(MAX_KERNEL_CTX_CNT + MAX_VMALLOC_CTX_CNT + \
- MAX_IO_CTX_CNT + MAX_VMEMMAP_CTX_CNT)
+ MAX_IO_CTX_CNT + MAX_VMEMMAP_CTX_CNT + 1)
 /*
  * For platforms that support on 65bit VA we limit the context bits
  */

Re: Freescale network device not activated on mpc8360 (kmeter1 board)

2020-01-07 Thread Heiner Kallweit

On 07.01.2020 14:05, Matteo Ghidoni wrote:
>  Hi all,
> 
> With the introduction of the following patch, we are facing an issue with the 
> activation of the Freescale network device (ucc_geth driver) on our kmeter1 
> board based on a MPC8360:

+Li as ucc_geth maintainer

Can you describe the symptoms of the issue?

> 
> commit 124eee3f6955f7aa19b9e6ff5c9b6d37cb3d1e2c
> Author: Heiner Kallweit 
> Date:   Tue Sep 18 21:55:36 2018 +0200
> 
> net: linkwatch: add check for netdevice being present to linkwatch_do_dev
> 
> Based on my observations, just before trying to activate the device through 
> linkwatch_event, the controller wants to adjust the MAC configuration and in 
> order to achieve this it detaches the device. This avoids the activation of 
> the net device.
> 
It sounds a little bit odd to rely on an asynchronous linkwatch event here.
Can you give a call trace?

The driver is quite old and maybe some parts need to be improved. The 
referenced change is more than a year old
and I'm not aware of any other problem with it. So it seems the change isn't 
wrong.

> This is already happening with older versions (I checked with the v4.14.162) 
> and also there the situation is the same, but without the additional check in 
> the if condition the device is activated.
> 
> I am currently working with the v5.4.8 kernel version, but the behavior 
> remains the same also with the latest v5.5-rc4.
> 
> Any idea how to solve this? Any help is appreciated.
> 
> Regards,
> Matteo
> 
Heiner

Re: [PATCH v3 02/22] compat: provide compat_ptr() on all architectures

2020-01-07 Thread Heiko Carstens

On Thu, Jan 02, 2020 at 03:55:20PM +0100, Arnd Bergmann wrote:
> In order to avoid needless #ifdef CONFIG_COMPAT checks,
> move the compat_ptr() definition to linux/compat.h
> where it can be seen by any file regardless of the
> architecture.
> 
> Only s390 needs a special definition, this can use the
> self-#define trick we have elsewhere.
> 
> Signed-off-by: Arnd Bergmann 
> ---
>  arch/arm64/include/asm/compat.h   | 17 -
>  arch/mips/include/asm/compat.h| 18 --
>  arch/parisc/include/asm/compat.h  | 17 -
>  arch/powerpc/include/asm/compat.h | 17 -
>  arch/powerpc/oprofile/backtrace.c |  2 +-
>  arch/s390/include/asm/compat.h|  6 +-
>  arch/sparc/include/asm/compat.h   | 17 -
>  arch/x86/include/asm/compat.h | 17 -
>  include/linux/compat.h| 18 ++
>  9 files changed, 20 insertions(+), 109 deletions(-)

For s390:

Acked-by: Heiko Carstens

Re: PPC64: G5 & 4k/64k page size

2020-01-07 Thread Bertrand




On 07/01/2020 18:13, Romain Dolbeau wrote:

Le mar. 7 janv. 2020 à 16:27, Bertrand  a écrit :

I've tested the 5.5-rc package with the link you gave hereafter. When
you said "working kernel", did you mean it would not crash ? It doesn't
for me. I can boot successfully.



As your machine is booting regular 5.3 as well and seems not affected by
the bug, can you share some details about it? (exact model, number
of cores, ...).



It's this one : 
https://everymac.com/systems/apple/powermac_g5/specs/powermac_g5_2.0_dp_pci.html


powermac G5 PCI, two 2 Ghz cpu, radeon 9600, with 2GB of ram. I can send 
dmesg or something if you want.


Bertrand

[PATCH 05/12] i2c: powermac: convert to use i2c_new_client_device()

2020-01-07 Thread Wolfram Sang

Move away from the deprecated API and return the shiny new ERRPTR where
useful.

Signed-off-by: Wolfram Sang 
---
Build tested only.

 drivers/i2c/busses/i2c-powermac.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/i2c/busses/i2c-powermac.c 
b/drivers/i2c/busses/i2c-powermac.c
index 504f5bf0e625..973e5339033c 100644
--- a/drivers/i2c/busses/i2c-powermac.c
+++ b/drivers/i2c/busses/i2c-powermac.c
@@ -240,8 +240,8 @@ static void i2c_powermac_create_one(struct i2c_adapter 
*adap,
 
strncpy(info.type, type, sizeof(info.type));
info.addr = addr;
-   newdev = i2c_new_device(adap, );
-   if (!newdev)
+   newdev = i2c_new_client_device(adap, );
+   if (IS_ERR(newdev))
dev_err(>dev,
"i2c-powermac: Failure to register missing %s\n",
type);
@@ -359,8 +359,8 @@ static void i2c_powermac_register_devices(struct 
i2c_adapter *adap,
info.irq = irq_of_parse_and_map(node, 0);
info.of_node = of_node_get(node);
 
-   newdev = i2c_new_device(adap, );
-   if (!newdev) {
+   newdev = i2c_new_client_device(adap, );
+   if (IS_ERR(newdev)) {
dev_err(>dev, "i2c-powermac: Failure to register"
" %pOF\n", node);
of_node_put(node);
-- 
2.20.1

[PATCH 00/12] i2c: convert subsystem to use i2c_new_client_device()

2020-01-07 Thread Wolfram Sang

This patch series converts the I2C subsystem to use the new API. Drivers
have been build tested. There is one user left in the SMBus part of the
core which will need a seperate series because all users of this
function need to be checked/converted, too.

Except for documentation patches, the conversion has been done with a
coccinelle script and further simplification have been applied when
proofreading the patches.

A branch is here:

git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux.git 
renesas/i2c/new_client_device

Looking forward to comments...

Wolfram Sang (12):
  i2c: cht-wc: convert to use i2c_new_client_device()
  i2c: i801: convert to use i2c_new_client_device()
  i2c: nvidia-gpu: convert to use i2c_new_client_device()
  i2c: ocores: convert to use i2c_new_client_device()
  i2c: powermac: convert to use i2c_new_client_device()
  i2c: taos-evm: convert to use i2c_new_client_device()
  i2c: xiic: convert to use i2c_new_client_device()
  i2c: i2c-core-acpi: convert to use i2c_new_client_device()
  i2c: i2c-core-base: convert to use i2c_new_client_device()
  i2c: i2c-core-of: convert to use i2c_new_client_device()
  docs: i2c: use the new API in 'instantiating-devices.rst'
  docs: i2c: use the new API in 'writing-clients'

 Documentation/i2c/instantiating-devices.rst |  8 
 Documentation/i2c/writing-clients.rst   | 20 ++--
 drivers/i2c/busses/i2c-cht-wc.c |  6 +++---
 drivers/i2c/busses/i2c-i801.c   |  6 +++---
 drivers/i2c/busses/i2c-nvidia-gpu.c |  6 +++---
 drivers/i2c/busses/i2c-ocores.c |  2 +-
 drivers/i2c/busses/i2c-powermac.c   |  8 
 drivers/i2c/busses/i2c-taos-evm.c   |  4 ++--
 drivers/i2c/busses/i2c-xiic.c   |  2 +-
 drivers/i2c/i2c-core-acpi.c | 12 
 drivers/i2c/i2c-core-base.c | 13 ++---
 drivers/i2c/i2c-core-of.c   |  7 +++
 12 files changed, 44 insertions(+), 50 deletions(-)

-- 
2.20.1

Re: PPC64: G5 & 4k/64k page size

2020-01-07 Thread Romain Dolbeau

Le mar. 7 janv. 2020 à 16:27, Bertrand  a écrit :
> I've tested the 5.5-rc package with the link you gave hereafter. When
> you said "working kernel", did you mean it would not crash ? It doesn't
> for me. I can boot successfully.

Yes, unlike the default Debian build, this one has 4 KiB page size.
It's not a solution, but it's a workaround for those affected by the bug
and perhaps a clue to the issue.

As your machine is booting regular 5.3 as well and seems not affected by
the bug, can you share some details about it? (exact model, number
of cores, ...).

> The only weirdness I could notice is that my swap space wasn't mounted.

No surprise, I think the on-disk format for the swap space hardwire
the page size. There's even a '-p' option to 'mkswap' to specify a
page size when creating the swap space.

Cordially,

--
Romain Dolbeau

Re: [RFT 06/13] arc: Constify ioreadX() iomem argument (as in generic implementation)

2020-01-07 Thread Arnd Bergmann

On Tue, Jan 7, 2020 at 5:54 PM Krzysztof Kozlowski  wrote:
>
> The ioreadX() helpers have inconsistent interface.  On some architectures
> void *__iomem address argument is a pointer to const, on some not.
>
> Implementations of ioreadX() do not modify the memory under the
> address so they can be converted to a "const" version for const-safety
> and consistency among architectures.
>
> Signed-off-by: Krzysztof Kozlowski 

The patch looks correct, but I would not bother here, as it has no
effect on the compiler or its output.

  Arnd

Re: [RFT 03/13] sh: Constify ioreadX() iomem argument (as in generic implementation)

2020-01-07 Thread Arnd Bergmann

On Tue, Jan 7, 2020 at 5:54 PM Krzysztof Kozlowski  wrote:
>
> The ioreadX() helpers have inconsistent interface.  On some architectures
> void *__iomem address argument is a pointer to const, on some not.
>
> Implementations of ioreadX() do not modify the memory under the address
> so they can be converted to a "const" version for const-safety and
> consistency among architectures.
>
> Signed-off-by: Krzysztof Kozlowski 

The patch looks good, but I think this has to be done together with the powerpc
version and the header that declares the function, for bisectibility.

   Arnd

[RFT 13/13] virtio: pci: Constify ioreadX() iomem argument (as in generic implementation)

2020-01-07 Thread Krzysztof Kozlowski

The ioreadX() helpers have inconsistent interface.  On some architectures
void *__iomem address argument is a pointer to const, on some not.

Implementations of ioreadX() do not modify the memory under the address
so they can be converted to a "const" version for const-safety and
consistency among architectures.

Signed-off-by: Krzysztof Kozlowski 
---
 drivers/virtio/virtio_pci_modern.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/virtio/virtio_pci_modern.c 
b/drivers/virtio/virtio_pci_modern.c
index 7abcc50838b8..fc58db4ab6c3 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -26,16 +26,16 @@
  * method, i.e. 32-bit accesses for 32-bit fields, 16-bit accesses
  * for 16-bit fields and 8-bit accesses for 8-bit fields.
  */
-static inline u8 vp_ioread8(u8 __iomem *addr)
+static inline u8 vp_ioread8(const u8 __iomem *addr)
 {
return ioread8(addr);
 }
-static inline u16 vp_ioread16 (__le16 __iomem *addr)
+static inline u16 vp_ioread16 (const __le16 __iomem *addr)
 {
return ioread16(addr);
 }
 
-static inline u32 vp_ioread32(__le32 __iomem *addr)
+static inline u32 vp_ioread32(const __le32 __iomem *addr)
 {
return ioread32(addr);
 }
-- 
2.7.4

[RFT 12/13] ntb: intel: Constify ioreadX() iomem argument (as in generic implementation)

2020-01-07 Thread Krzysztof Kozlowski

The ioreadX() helpers have inconsistent interface.  On some architectures
void *__iomem address argument is a pointer to const, on some not.

Implementations of ioreadX() do not modify the memory under the address
so they can be converted to a "const" version for const-safety and
consistency among architectures.

Signed-off-by: Krzysztof Kozlowski 
---
 drivers/ntb/hw/intel/ntb_hw_gen1.c  | 2 +-
 drivers/ntb/hw/intel/ntb_hw_gen3.h  | 2 +-
 drivers/ntb/hw/intel/ntb_hw_intel.h | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/ntb/hw/intel/ntb_hw_gen1.c 
b/drivers/ntb/hw/intel/ntb_hw_gen1.c
index bb57ec239029..9202502a9787 100644
--- a/drivers/ntb/hw/intel/ntb_hw_gen1.c
+++ b/drivers/ntb/hw/intel/ntb_hw_gen1.c
@@ -1202,7 +1202,7 @@ int intel_ntb_peer_spad_write(struct ntb_dev *ntb, int 
pidx, int sidx,
   ndev->peer_reg->spad);
 }
 
-static u64 xeon_db_ioread(void __iomem *mmio)
+static u64 xeon_db_ioread(const void __iomem *mmio)
 {
return (u64)ioread16(mmio);
 }
diff --git a/drivers/ntb/hw/intel/ntb_hw_gen3.h 
b/drivers/ntb/hw/intel/ntb_hw_gen3.h
index 75fb86ca27bb..d1455f24ec99 100644
--- a/drivers/ntb/hw/intel/ntb_hw_gen3.h
+++ b/drivers/ntb/hw/intel/ntb_hw_gen3.h
@@ -91,7 +91,7 @@
 #define GEN3_DB_TOTAL_SHIFT33
 #define GEN3_SPAD_COUNT16
 
-static inline u64 gen3_db_ioread(void __iomem *mmio)
+static inline u64 gen3_db_ioread(const void __iomem *mmio)
 {
return ioread64(mmio);
 }
diff --git a/drivers/ntb/hw/intel/ntb_hw_intel.h 
b/drivers/ntb/hw/intel/ntb_hw_intel.h
index e071e28bca3f..3c0a5a2da241 100644
--- a/drivers/ntb/hw/intel/ntb_hw_intel.h
+++ b/drivers/ntb/hw/intel/ntb_hw_intel.h
@@ -102,7 +102,7 @@ struct intel_ntb_dev;
 struct intel_ntb_reg {
int (*poll_link)(struct intel_ntb_dev *ndev);
int (*link_is_up)(struct intel_ntb_dev *ndev);
-   u64 (*db_ioread)(void __iomem *mmio);
+   u64 (*db_ioread)(const void __iomem *mmio);
void (*db_iowrite)(u64 db_bits, void __iomem *mmio);
unsigned long   ntb_ctl;
resource_size_t db_size;
-- 
2.7.4

[RFT 11/13] net: wireless: rtl818x: Constify ioreadX() iomem argument (as in generic implementation)

2020-01-07 Thread Krzysztof Kozlowski

The ioreadX() helpers have inconsistent interface.  On some architectures
void *__iomem address argument is a pointer to const, on some not.

Implementations of ioreadX() do not modify the memory under the address
so they can be converted to a "const" version for const-safety and
consistency among architectures.

Signed-off-by: Krzysztof Kozlowski 
---
 drivers/net/wireless/realtek/rtl818x/rtl8180/rtl8180.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/realtek/rtl818x/rtl8180/rtl8180.h 
b/drivers/net/wireless/realtek/rtl818x/rtl8180/rtl8180.h
index 7948a2da195a..2ff00800d45b 100644
--- a/drivers/net/wireless/realtek/rtl818x/rtl8180/rtl8180.h
+++ b/drivers/net/wireless/realtek/rtl818x/rtl8180/rtl8180.h
@@ -150,17 +150,17 @@ void rtl8180_write_phy(struct ieee80211_hw *dev, u8 addr, 
u32 data);
 void rtl8180_set_anaparam(struct rtl8180_priv *priv, u32 anaparam);
 void rtl8180_set_anaparam2(struct rtl8180_priv *priv, u32 anaparam2);
 
-static inline u8 rtl818x_ioread8(struct rtl8180_priv *priv, u8 __iomem *addr)
+static inline u8 rtl818x_ioread8(struct rtl8180_priv *priv, const u8 __iomem 
*addr)
 {
return ioread8(addr);
 }
 
-static inline u16 rtl818x_ioread16(struct rtl8180_priv *priv, __le16 __iomem 
*addr)
+static inline u16 rtl818x_ioread16(struct rtl8180_priv *priv, const __le16 
__iomem *addr)
 {
return ioread16(addr);
 }
 
-static inline u32 rtl818x_ioread32(struct rtl8180_priv *priv, __le32 __iomem 
*addr)
+static inline u32 rtl818x_ioread32(struct rtl8180_priv *priv, const __le32 
__iomem *addr)
 {
return ioread32(addr);
 }
-- 
2.7.4

[RFT 10/13] net: wireless: ath5k: Constify ioreadX() iomem argument (as in generic implementation)

2020-01-07 Thread Krzysztof Kozlowski

The ioreadX() helpers have inconsistent interface.  On some architectures
void *__iomem address argument is a pointer to const, on some not.

Implementations of ioreadX() do not modify the memory under the address
so they can be converted to a "const" version for const-safety and
consistency among architectures.

Signed-off-by: Krzysztof Kozlowski 
---
 drivers/net/wireless/ath/ath5k/ahb.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/net/wireless/ath/ath5k/ahb.c 
b/drivers/net/wireless/ath/ath5k/ahb.c
index 2c9cec8b53d9..8bd01df369fb 100644
--- a/drivers/net/wireless/ath/ath5k/ahb.c
+++ b/drivers/net/wireless/ath/ath5k/ahb.c
@@ -138,18 +138,18 @@ static int ath_ahb_probe(struct platform_device *pdev)
 
if (bcfg->devid >= AR5K_SREV_AR2315_R6) {
/* Enable WMAC AHB arbitration */
-   reg = ioread32((void __iomem *) AR5K_AR2315_AHB_ARB_CTL);
+   reg = ioread32((const void __iomem *) AR5K_AR2315_AHB_ARB_CTL);
reg |= AR5K_AR2315_AHB_ARB_CTL_WLAN;
iowrite32(reg, (void __iomem *) AR5K_AR2315_AHB_ARB_CTL);
 
/* Enable global WMAC swapping */
-   reg = ioread32((void __iomem *) AR5K_AR2315_BYTESWAP);
+   reg = ioread32((const void __iomem *) AR5K_AR2315_BYTESWAP);
reg |= AR5K_AR2315_BYTESWAP_WMAC;
iowrite32(reg, (void __iomem *) AR5K_AR2315_BYTESWAP);
} else {
/* Enable WMAC DMA access (assuming 5312 or 231x*/
/* TODO: check other platforms */
-   reg = ioread32((void __iomem *) AR5K_AR5312_ENABLE);
+   reg = ioread32((const void __iomem *) AR5K_AR5312_ENABLE);
if (to_platform_device(ah->dev)->id == 0)
reg |= AR5K_AR5312_ENABLE_WLAN0;
else
@@ -202,12 +202,12 @@ static int ath_ahb_remove(struct platform_device *pdev)
 
if (bcfg->devid >= AR5K_SREV_AR2315_R6) {
/* Disable WMAC AHB arbitration */
-   reg = ioread32((void __iomem *) AR5K_AR2315_AHB_ARB_CTL);
+   reg = ioread32((const void __iomem *) AR5K_AR2315_AHB_ARB_CTL);
reg &= ~AR5K_AR2315_AHB_ARB_CTL_WLAN;
iowrite32(reg, (void __iomem *) AR5K_AR2315_AHB_ARB_CTL);
} else {
/*Stop DMA access */
-   reg = ioread32((void __iomem *) AR5K_AR5312_ENABLE);
+   reg = ioread32((const void __iomem *) AR5K_AR5312_ENABLE);
if (to_platform_device(ah->dev)->id == 0)
reg &= ~AR5K_AR5312_ENABLE_WLAN0;
else
-- 
2.7.4

[RFT 09/13] media: fsl-viu: Constify ioreadX() iomem argument (as in generic implementation)

2020-01-07 Thread Krzysztof Kozlowski

The ioreadX() helpers have inconsistent interface.  On some architectures
void *__iomem address argument is a pointer to const, on some not.

Implementations of ioreadX() do not modify the memory under the address
so they can be converted to a "const" version for const-safety and
consistency among architectures.

Signed-off-by: Krzysztof Kozlowski 
---
 drivers/media/platform/fsl-viu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/media/platform/fsl-viu.c b/drivers/media/platform/fsl-viu.c
index 81a8faedbba6..991d9dc82749 100644
--- a/drivers/media/platform/fsl-viu.c
+++ b/drivers/media/platform/fsl-viu.c
@@ -34,7 +34,7 @@
 /* Allow building this driver with COMPILE_TEST */
 #if !defined(CONFIG_PPC) && !defined(CONFIG_MICROBLAZE)
 #define out_be32(v, a) iowrite32be(a, (void __iomem *)v)
-#define in_be32(a) ioread32be((void __iomem *)a)
+#define in_be32(a) ioread32be((const void __iomem *)a)
 #endif
 
 #define BUFFER_TIMEOUT msecs_to_jiffies(500)  /* 0.5 seconds */
-- 
2.7.4

[RFT 08/13] drm/nouveau: Constify ioreadX() iomem argument (as in generic implementation)

2020-01-07 Thread Krzysztof Kozlowski

The ioreadX() helpers have inconsistent interface.  On some architectures
void *__iomem address argument is a pointer to const, on some not.

Implementations of ioreadX() do not modify the memory under the address
so they can be converted to a "const" version for const-safety and
consistency among architectures.

Signed-off-by: Krzysztof Kozlowski 
---
 drivers/gpu/drm/nouveau/nouveau_bo.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index f8015e0318d7..5120d062c2df 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -613,7 +613,7 @@ nouveau_bo_rd32(struct nouveau_bo *nvbo, unsigned index)
mem += index;
 
if (is_iomem)
-   return ioread32_native((void __force __iomem *)mem);
+   return ioread32_native((const void __force __iomem *)mem);
else
return *mem;
 }
-- 
2.7.4

[RFT 07/13] drm/mgag200: Constify ioreadX() iomem argument (as in generic implementation)

2020-01-07 Thread Krzysztof Kozlowski

The ioreadX() helpers have inconsistent interface.  On some architectures
void *__iomem address argument is a pointer to const, on some not.

Implementations of ioreadX() do not modify the memory under the address
so they can be converted to a "const" version for const-safety and
consistency among architectures.

Signed-off-by: Krzysztof Kozlowski 
---
 drivers/gpu/drm/mgag200/mgag200_drv.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/mgag200/mgag200_drv.h 
b/drivers/gpu/drm/mgag200/mgag200_drv.h
index aa32aad222c2..6512b3af4fb7 100644
--- a/drivers/gpu/drm/mgag200/mgag200_drv.h
+++ b/drivers/gpu/drm/mgag200/mgag200_drv.h
@@ -34,9 +34,9 @@
 
 #define MGAG200FB_CONN_LIMIT 1
 
-#define RREG8(reg) ioread8(((void __iomem *)mdev->rmmio) + (reg))
+#define RREG8(reg) ioread8(((const void __iomem *)mdev->rmmio) + (reg))
 #define WREG8(reg, v) iowrite8(v, ((void __iomem *)mdev->rmmio) + (reg))
-#define RREG32(reg) ioread32(((void __iomem *)mdev->rmmio) + (reg))
+#define RREG32(reg) ioread32(((const void __iomem *)mdev->rmmio) + (reg))
 #define WREG32(reg, v) iowrite32(v, ((void __iomem *)mdev->rmmio) + (reg))
 
 #define ATTR_INDEX 0x1fc0
-- 
2.7.4

[RFT 06/13] arc: Constify ioreadX() iomem argument (as in generic implementation)

2020-01-07 Thread Krzysztof Kozlowski

The ioreadX() helpers have inconsistent interface.  On some architectures
void *__iomem address argument is a pointer to const, on some not.

Implementations of ioreadX() do not modify the memory under the
address so they can be converted to a "const" version for const-safety
and consistency among architectures.

Signed-off-by: Krzysztof Kozlowski 
---
 arch/arc/plat-axs10x/axs10x.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arc/plat-axs10x/axs10x.c b/arch/arc/plat-axs10x/axs10x.c
index 63ea5a606ecd..180c260a8221 100644
--- a/arch/arc/plat-axs10x/axs10x.c
+++ b/arch/arc/plat-axs10x/axs10x.c
@@ -84,7 +84,7 @@ static void __init axs10x_print_board_ver(unsigned int creg, 
const char *str)
unsigned int val;
} board;
 
-   board.val = ioread32((void __iomem *)creg);
+   board.val = ioread32((const void __iomem *)creg);
pr_info("AXS: %s FPGA Date: %u-%u-%u\n", str, board.d, board.m,
board.y);
 }
@@ -95,7 +95,7 @@ static void __init axs10x_early_init(void)
char mb[32];
 
/* Determine motherboard version */
-   if (ioread32((void __iomem *) CREG_MB_CONFIG) & (1 << 28))
+   if (ioread32((const void __iomem *) CREG_MB_CONFIG) & (1 << 28))
mb_rev = 3; /* HT-3 (rev3.0) */
else
mb_rev = 2; /* HT-2 (rev2.0) */
-- 
2.7.4

[RFT 05/13] powerpc: Constify ioreadX() iomem argument (as in generic implementation)

2020-01-07 Thread Krzysztof Kozlowski

The ioreadX() helpers have inconsistent interface.  On some architectures
void *__iomem address argument is a pointer to const, on some not.

Implementations of ioreadX() do not modify the memory under the address
so they can be converted to a "const" version for const-safety and
consistency among architectures.

Signed-off-by: Krzysztof Kozlowski 
---
 arch/powerpc/kernel/iomap.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/kernel/iomap.c b/arch/powerpc/kernel/iomap.c
index 5ac84efc6ede..de8da1c3496f 100644
--- a/arch/powerpc/kernel/iomap.c
+++ b/arch/powerpc/kernel/iomap.c
@@ -15,23 +15,23 @@
  * Here comes the ppc64 implementation of the IOMAP 
  * interfaces.
  */
-unsigned int ioread8(void __iomem *addr)
+unsigned int ioread8(const void __iomem *addr)
 {
return readb(addr);
 }
-unsigned int ioread16(void __iomem *addr)
+unsigned int ioread16(const void __iomem *addr)
 {
return readw(addr);
 }
-unsigned int ioread16be(void __iomem *addr)
+unsigned int ioread16be(const void __iomem *addr)
 {
return readw_be(addr);
 }
-unsigned int ioread32(void __iomem *addr)
+unsigned int ioread32(const void __iomem *addr)
 {
return readl(addr);
 }
-unsigned int ioread32be(void __iomem *addr)
+unsigned int ioread32be(const void __iomem *addr)
 {
return readl_be(addr);
 }
@@ -41,27 +41,27 @@ EXPORT_SYMBOL(ioread16be);
 EXPORT_SYMBOL(ioread32);
 EXPORT_SYMBOL(ioread32be);
 #ifdef __powerpc64__
-u64 ioread64(void __iomem *addr)
+u64 ioread64(const void __iomem *addr)
 {
return readq(addr);
 }
-u64 ioread64_lo_hi(void __iomem *addr)
+u64 ioread64_lo_hi(const void __iomem *addr)
 {
return readq(addr);
 }
-u64 ioread64_hi_lo(void __iomem *addr)
+u64 ioread64_hi_lo(const void __iomem *addr)
 {
return readq(addr);
 }
-u64 ioread64be(void __iomem *addr)
+u64 ioread64be(const void __iomem *addr)
 {
return readq_be(addr);
 }
-u64 ioread64be_lo_hi(void __iomem *addr)
+u64 ioread64be_lo_hi(const void __iomem *addr)
 {
return readq_be(addr);
 }
-u64 ioread64be_hi_lo(void __iomem *addr)
+u64 ioread64be_hi_lo(const void __iomem *addr)
 {
return readq_be(addr);
 }
-- 
2.7.4

[RFT 04/13] parisc: Constify ioreadX() iomem argument (as in generic implementation)

2020-01-07 Thread Krzysztof Kozlowski

The ioreadX() helpers have inconsistent interface.  On some architectures
void *__iomem address argument is a pointer to const, on some not.

Implementations of ioreadX() do not modify the memory under the address
so they can be converted to a "const" version for const-safety and
consistency among architectures.

Signed-off-by: Krzysztof Kozlowski 
---
 arch/parisc/include/asm/io.h |  4 ++--
 arch/parisc/lib/iomap.c  | 48 ++--
 2 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/arch/parisc/include/asm/io.h b/arch/parisc/include/asm/io.h
index cab8f64ca4a2..f48fb8d76e15 100644
--- a/arch/parisc/include/asm/io.h
+++ b/arch/parisc/include/asm/io.h
@@ -303,8 +303,8 @@ extern void outsl (unsigned long port, const void *src, 
unsigned long count);
 #define ioread64be ioread64be
 #define iowrite64 iowrite64
 #define iowrite64be iowrite64be
-extern u64 ioread64(void __iomem *addr);
-extern u64 ioread64be(void __iomem *addr);
+extern u64 ioread64(const void __iomem *addr);
+extern u64 ioread64be(const void __iomem *addr);
 extern void iowrite64(u64 val, void __iomem *addr);
 extern void iowrite64be(u64 val, void __iomem *addr);
 
diff --git a/arch/parisc/lib/iomap.c b/arch/parisc/lib/iomap.c
index 0195aec657e2..e01345d2a7d9 100644
--- a/arch/parisc/lib/iomap.c
+++ b/arch/parisc/lib/iomap.c
@@ -43,13 +43,13 @@
 #endif
 
 struct iomap_ops {
-   unsigned int (*read8)(void __iomem *);
-   unsigned int (*read16)(void __iomem *);
-   unsigned int (*read16be)(void __iomem *);
-   unsigned int (*read32)(void __iomem *);
-   unsigned int (*read32be)(void __iomem *);
-   u64 (*read64)(void __iomem *);
-   u64 (*read64be)(void __iomem *);
+   unsigned int (*read8)(const void __iomem *);
+   unsigned int (*read16)(const void __iomem *);
+   unsigned int (*read16be)(const void __iomem *);
+   unsigned int (*read32)(const void __iomem *);
+   unsigned int (*read32be)(const void __iomem *);
+   u64 (*read64)(const void __iomem *);
+   u64 (*read64be)(const void __iomem *);
void (*write8)(u8, void __iomem *);
void (*write16)(u16, void __iomem *);
void (*write16be)(u16, void __iomem *);
@@ -69,17 +69,17 @@ struct iomap_ops {
 
 #define ADDR2PORT(addr) ((unsigned long __force)(addr) & 0xff)
 
-static unsigned int ioport_read8(void __iomem *addr)
+static unsigned int ioport_read8(const void __iomem *addr)
 {
return inb(ADDR2PORT(addr));
 }
 
-static unsigned int ioport_read16(void __iomem *addr)
+static unsigned int ioport_read16(const void __iomem *addr)
 {
return inw(ADDR2PORT(addr));
 }
 
-static unsigned int ioport_read32(void __iomem *addr)
+static unsigned int ioport_read32(const void __iomem *addr)
 {
return inl(ADDR2PORT(addr));
 }
@@ -150,37 +150,37 @@ static const struct iomap_ops ioport_ops = {
 
 /* Legacy I/O memory ops */
 
-static unsigned int iomem_read8(void __iomem *addr)
+static unsigned int iomem_read8(const void __iomem *addr)
 {
return readb(addr);
 }
 
-static unsigned int iomem_read16(void __iomem *addr)
+static unsigned int iomem_read16(const void __iomem *addr)
 {
return readw(addr);
 }
 
-static unsigned int iomem_read16be(void __iomem *addr)
+static unsigned int iomem_read16be(const void __iomem *addr)
 {
return __raw_readw(addr);
 }
 
-static unsigned int iomem_read32(void __iomem *addr)
+static unsigned int iomem_read32(const void __iomem *addr)
 {
return readl(addr);
 }
 
-static unsigned int iomem_read32be(void __iomem *addr)
+static unsigned int iomem_read32be(const void __iomem *addr)
 {
return __raw_readl(addr);
 }
 
-static u64 iomem_read64(void __iomem *addr)
+static u64 iomem_read64(const void __iomem *addr)
 {
return readq(addr);
 }
 
-static u64 iomem_read64be(void __iomem *addr)
+static u64 iomem_read64be(const void __iomem *addr)
 {
return __raw_readq(addr);
 }
@@ -297,49 +297,49 @@ static const struct iomap_ops *iomap_ops[8] = {
 };
 
 
-unsigned int ioread8(void __iomem *addr)
+unsigned int ioread8(const void __iomem *addr)
 {
if (unlikely(INDIRECT_ADDR(addr)))
return iomap_ops[ADDR_TO_REGION(addr)]->read8(addr);
return *((u8 *)addr);
 }
 
-unsigned int ioread16(void __iomem *addr)
+unsigned int ioread16(const void __iomem *addr)
 {
if (unlikely(INDIRECT_ADDR(addr)))
return iomap_ops[ADDR_TO_REGION(addr)]->read16(addr);
return le16_to_cpup((u16 *)addr);
 }
 
-unsigned int ioread16be(void __iomem *addr)
+unsigned int ioread16be(const void __iomem *addr)
 {
if (unlikely(INDIRECT_ADDR(addr)))
return iomap_ops[ADDR_TO_REGION(addr)]->read16be(addr);
return *((u16 *)addr);
 }
 
-unsigned int ioread32(void __iomem *addr)
+unsigned int ioread32(const void __iomem *addr)
 {
if (unlikely(INDIRECT_ADDR(addr)))
return iomap_ops[ADDR_TO_REGION(addr)]->read32(addr);

[RFT 03/13] sh: Constify ioreadX() iomem argument (as in generic implementation)

2020-01-07 Thread Krzysztof Kozlowski

The ioreadX() helpers have inconsistent interface.  On some architectures
void *__iomem address argument is a pointer to const, on some not.

Implementations of ioreadX() do not modify the memory under the address
so they can be converted to a "const" version for const-safety and
consistency among architectures.

Signed-off-by: Krzysztof Kozlowski 
---
 arch/sh/kernel/iomap.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/sh/kernel/iomap.c b/arch/sh/kernel/iomap.c
index ef9e2c97cbb7..bd5e212c6ea6 100644
--- a/arch/sh/kernel/iomap.c
+++ b/arch/sh/kernel/iomap.c
@@ -8,31 +8,31 @@
 #include 
 #include 
 
-unsigned int ioread8(void __iomem *addr)
+unsigned int ioread8(const void __iomem *addr)
 {
return readb(addr);
 }
 EXPORT_SYMBOL(ioread8);
 
-unsigned int ioread16(void __iomem *addr)
+unsigned int ioread16(const void __iomem *addr)
 {
return readw(addr);
 }
 EXPORT_SYMBOL(ioread16);
 
-unsigned int ioread16be(void __iomem *addr)
+unsigned int ioread16be(const void __iomem *addr)
 {
return be16_to_cpu(__raw_readw(addr));
 }
 EXPORT_SYMBOL(ioread16be);
 
-unsigned int ioread32(void __iomem *addr)
+unsigned int ioread32(const void __iomem *addr)
 {
return readl(addr);
 }
 EXPORT_SYMBOL(ioread32);
 
-unsigned int ioread32be(void __iomem *addr)
+unsigned int ioread32be(const void __iomem *addr)
 {
return be32_to_cpu(__raw_readl(addr));
 }
-- 
2.7.4

[RFT 03/13] alpha: Constify ioreadX() iomem argument (as in generic implementation)

2020-01-07 Thread Krzysztof Kozlowski

The ioreadX() helpers have inconsistent interface.  On some architectures
void *__iomem address argument is a pointer to const, on some not.

Implementations of ioreadX() do not modify the memory under the address
so they can be converted to a "const" version for const-safety and
consistency among architectures.

Signed-off-by: Krzysztof Kozlowski 
---
 arch/alpha/include/asm/core_apecs.h  |  6 +++---
 arch/alpha/include/asm/core_cia.h|  6 +++---
 arch/alpha/include/asm/core_lca.h|  6 +++---
 arch/alpha/include/asm/core_marvel.h |  4 ++--
 arch/alpha/include/asm/core_mcpcia.h |  6 +++---
 arch/alpha/include/asm/core_t2.h |  2 +-
 arch/alpha/include/asm/io.h  | 12 ++--
 arch/alpha/include/asm/io_trivial.h  | 16 
 arch/alpha/include/asm/jensen.h  |  2 +-
 arch/alpha/include/asm/machvec.h |  6 +++---
 arch/alpha/kernel/core_marvel.c  |  2 +-
 arch/alpha/kernel/io.c   |  6 +++---
 12 files changed, 37 insertions(+), 37 deletions(-)

diff --git a/arch/alpha/include/asm/core_apecs.h 
b/arch/alpha/include/asm/core_apecs.h
index 0a07055bc0fe..2d9726fc02ef 100644
--- a/arch/alpha/include/asm/core_apecs.h
+++ b/arch/alpha/include/asm/core_apecs.h
@@ -384,7 +384,7 @@ struct el_apecs_procdata
}   \
} while (0)
 
-__EXTERN_INLINE unsigned int apecs_ioread8(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int apecs_ioread8(const void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
unsigned long result, base_and_type;
@@ -420,7 +420,7 @@ __EXTERN_INLINE void apecs_iowrite8(u8 b, void __iomem 
*xaddr)
*(vuip) ((addr << 5) + base_and_type) = w;
 }
 
-__EXTERN_INLINE unsigned int apecs_ioread16(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int apecs_ioread16(const void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
unsigned long result, base_and_type;
@@ -456,7 +456,7 @@ __EXTERN_INLINE void apecs_iowrite16(u16 b, void __iomem 
*xaddr)
*(vuip) ((addr << 5) + base_and_type) = w;
 }
 
-__EXTERN_INLINE unsigned int apecs_ioread32(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int apecs_ioread32(const void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
if (addr < APECS_DENSE_MEM)
diff --git a/arch/alpha/include/asm/core_cia.h 
b/arch/alpha/include/asm/core_cia.h
index c706a7f2b061..cb22991f6761 100644
--- a/arch/alpha/include/asm/core_cia.h
+++ b/arch/alpha/include/asm/core_cia.h
@@ -342,7 +342,7 @@ struct el_CIA_sysdata_mcheck {
 #define vuip   volatile unsigned int __force *
 #define vulp   volatile unsigned long __force *
 
-__EXTERN_INLINE unsigned int cia_ioread8(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int cia_ioread8(const void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
unsigned long result, base_and_type;
@@ -374,7 +374,7 @@ __EXTERN_INLINE void cia_iowrite8(u8 b, void __iomem *xaddr)
*(vuip) ((addr << 5) + base_and_type) = w;
 }
 
-__EXTERN_INLINE unsigned int cia_ioread16(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int cia_ioread16(const void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
unsigned long result, base_and_type;
@@ -404,7 +404,7 @@ __EXTERN_INLINE void cia_iowrite16(u16 b, void __iomem 
*xaddr)
*(vuip) ((addr << 5) + base_and_type) = w;
 }
 
-__EXTERN_INLINE unsigned int cia_ioread32(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int cia_ioread32(const void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
if (addr < CIA_DENSE_MEM)
diff --git a/arch/alpha/include/asm/core_lca.h 
b/arch/alpha/include/asm/core_lca.h
index 84d5e5b84f4f..ec86314418cb 100644
--- a/arch/alpha/include/asm/core_lca.h
+++ b/arch/alpha/include/asm/core_lca.h
@@ -230,7 +230,7 @@ union el_lca {
} while (0)
 
 
-__EXTERN_INLINE unsigned int lca_ioread8(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int lca_ioread8(const void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
unsigned long result, base_and_type;
@@ -266,7 +266,7 @@ __EXTERN_INLINE void lca_iowrite8(u8 b, void __iomem *xaddr)
*(vuip) ((addr << 5) + base_and_type) = w;
 }
 
-__EXTERN_INLINE unsigned int lca_ioread16(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int lca_ioread16(const void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
unsigned long result, base_and_type;
@@ -302,7 +302,7 @@ __EXTERN_INLINE void lca_iowrite16(u16 b, void __iomem 
*xaddr)
*(vuip) ((addr << 5) + base_and_type) = w;
 }
 
-__EXTERN_INLINE unsigned int lca_ioread32(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int lca_ioread32(const void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
if (addr < LCA_DENSE_MEM)
diff --git a/arch/alpha/include/asm/core_marvel.h 
b/arch/alpha/include/asm/core_marvel.h
index

[RFT 02/13] sh: Constify ioreadX() iomem argument (as in generic implementation)

2020-01-07 Thread Krzysztof Kozlowski

The ioreadX() helpers have inconsistent interface.  On some architectures
void *__iomem address argument is a pointer to const, on some not.

Implementations of ioreadX() do not modify the memory under the address
so they can be converted to a "const" version for const-safety and
consistency among architectures.

Suggested-by: Geert Uytterhoeven 
Signed-off-by: Krzysztof Kozlowski 
---
 arch/sh/kernel/iomap.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/sh/kernel/iomap.c b/arch/sh/kernel/iomap.c
index ef9e2c97cbb7..bd5e212c6ea6 100644
--- a/arch/sh/kernel/iomap.c
+++ b/arch/sh/kernel/iomap.c
@@ -8,31 +8,31 @@
 #include 
 #include 
 
-unsigned int ioread8(void __iomem *addr)
+unsigned int ioread8(const void __iomem *addr)
 {
return readb(addr);
 }
 EXPORT_SYMBOL(ioread8);
 
-unsigned int ioread16(void __iomem *addr)
+unsigned int ioread16(const void __iomem *addr)
 {
return readw(addr);
 }
 EXPORT_SYMBOL(ioread16);
 
-unsigned int ioread16be(void __iomem *addr)
+unsigned int ioread16be(const void __iomem *addr)
 {
return be16_to_cpu(__raw_readw(addr));
 }
 EXPORT_SYMBOL(ioread16be);
 
-unsigned int ioread32(void __iomem *addr)
+unsigned int ioread32(const void __iomem *addr)
 {
return readl(addr);
 }
 EXPORT_SYMBOL(ioread32);
 
-unsigned int ioread32be(void __iomem *addr)
+unsigned int ioread32be(const void __iomem *addr)
 {
return be32_to_cpu(__raw_readl(addr));
 }
-- 
2.7.4

[RFT 02/13] alpha: Constify ioreadX() iomem argument (as in generic implementation)

2020-01-07 Thread Krzysztof Kozlowski

The ioreadX() helpers have inconsistent interface.  On some architectures
void *__iomem address argument is a pointer to const, on some not.

Implementations of ioreadX() do not modify the memory under the address
so they can be converted to a "const" version for const-safety and
consistency among architectures.

Signed-off-by: Krzysztof Kozlowski 
---
 arch/alpha/include/asm/core_apecs.h  |  6 +++---
 arch/alpha/include/asm/core_cia.h|  6 +++---
 arch/alpha/include/asm/core_lca.h|  6 +++---
 arch/alpha/include/asm/core_marvel.h |  4 ++--
 arch/alpha/include/asm/core_mcpcia.h |  6 +++---
 arch/alpha/include/asm/core_t2.h |  2 +-
 arch/alpha/include/asm/io.h  | 12 ++--
 arch/alpha/include/asm/io_trivial.h  | 16 
 arch/alpha/include/asm/jensen.h  |  2 +-
 arch/alpha/include/asm/machvec.h |  6 +++---
 arch/alpha/kernel/core_marvel.c  |  2 +-
 arch/alpha/kernel/io.c   |  6 +++---
 12 files changed, 37 insertions(+), 37 deletions(-)

diff --git a/arch/alpha/include/asm/core_apecs.h 
b/arch/alpha/include/asm/core_apecs.h
index 0a07055bc0fe..2d9726fc02ef 100644
--- a/arch/alpha/include/asm/core_apecs.h
+++ b/arch/alpha/include/asm/core_apecs.h
@@ -384,7 +384,7 @@ struct el_apecs_procdata
}   \
} while (0)
 
-__EXTERN_INLINE unsigned int apecs_ioread8(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int apecs_ioread8(const void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
unsigned long result, base_and_type;
@@ -420,7 +420,7 @@ __EXTERN_INLINE void apecs_iowrite8(u8 b, void __iomem 
*xaddr)
*(vuip) ((addr << 5) + base_and_type) = w;
 }
 
-__EXTERN_INLINE unsigned int apecs_ioread16(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int apecs_ioread16(const void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
unsigned long result, base_and_type;
@@ -456,7 +456,7 @@ __EXTERN_INLINE void apecs_iowrite16(u16 b, void __iomem 
*xaddr)
*(vuip) ((addr << 5) + base_and_type) = w;
 }
 
-__EXTERN_INLINE unsigned int apecs_ioread32(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int apecs_ioread32(const void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
if (addr < APECS_DENSE_MEM)
diff --git a/arch/alpha/include/asm/core_cia.h 
b/arch/alpha/include/asm/core_cia.h
index c706a7f2b061..cb22991f6761 100644
--- a/arch/alpha/include/asm/core_cia.h
+++ b/arch/alpha/include/asm/core_cia.h
@@ -342,7 +342,7 @@ struct el_CIA_sysdata_mcheck {
 #define vuip   volatile unsigned int __force *
 #define vulp   volatile unsigned long __force *
 
-__EXTERN_INLINE unsigned int cia_ioread8(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int cia_ioread8(const void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
unsigned long result, base_and_type;
@@ -374,7 +374,7 @@ __EXTERN_INLINE void cia_iowrite8(u8 b, void __iomem *xaddr)
*(vuip) ((addr << 5) + base_and_type) = w;
 }
 
-__EXTERN_INLINE unsigned int cia_ioread16(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int cia_ioread16(const void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
unsigned long result, base_and_type;
@@ -404,7 +404,7 @@ __EXTERN_INLINE void cia_iowrite16(u16 b, void __iomem 
*xaddr)
*(vuip) ((addr << 5) + base_and_type) = w;
 }
 
-__EXTERN_INLINE unsigned int cia_ioread32(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int cia_ioread32(const void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
if (addr < CIA_DENSE_MEM)
diff --git a/arch/alpha/include/asm/core_lca.h 
b/arch/alpha/include/asm/core_lca.h
index 84d5e5b84f4f..ec86314418cb 100644
--- a/arch/alpha/include/asm/core_lca.h
+++ b/arch/alpha/include/asm/core_lca.h
@@ -230,7 +230,7 @@ union el_lca {
} while (0)
 
 
-__EXTERN_INLINE unsigned int lca_ioread8(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int lca_ioread8(const void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
unsigned long result, base_and_type;
@@ -266,7 +266,7 @@ __EXTERN_INLINE void lca_iowrite8(u8 b, void __iomem *xaddr)
*(vuip) ((addr << 5) + base_and_type) = w;
 }
 
-__EXTERN_INLINE unsigned int lca_ioread16(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int lca_ioread16(const void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
unsigned long result, base_and_type;
@@ -302,7 +302,7 @@ __EXTERN_INLINE void lca_iowrite16(u16 b, void __iomem 
*xaddr)
*(vuip) ((addr << 5) + base_and_type) = w;
 }
 
-__EXTERN_INLINE unsigned int lca_ioread32(void __iomem *xaddr)
+__EXTERN_INLINE unsigned int lca_ioread32(const void __iomem *xaddr)
 {
unsigned long addr = (unsigned long) xaddr;
if (addr < LCA_DENSE_MEM)
diff --git a/arch/alpha/include/asm/core_marvel.h 
b/arch/alpha/include/asm/core_marvel.h
index

[RFT 01/13] iomap: Constify ioreadX() iomem argument (as in generic implementation)

2020-01-07 Thread Krzysztof Kozlowski

The ioreadX() helpers have inconsistent interface.  On some architectures
void *__iomem address argument is a pointer to const, on some not.

Implementations of ioreadX() do not modify the memory under the address
so they can be converted to a "const" version for const-safety and
consistency among architectures.

Suggested-by: Geert Uytterhoeven 
Signed-off-by: Krzysztof Kozlowski 
---
 include/asm-generic/iomap.h   | 22 +++---
 include/linux/io-64-nonatomic-hi-lo.h |  4 ++--
 include/linux/io-64-nonatomic-lo-hi.h |  4 ++--
 lib/iomap.c   | 18 +-
 4 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/include/asm-generic/iomap.h b/include/asm-generic/iomap.h
index 9d28a5e82f73..986e894bef49 100644
--- a/include/asm-generic/iomap.h
+++ b/include/asm-generic/iomap.h
@@ -26,14 +26,14 @@
  * in the low address range. Architectures for which this is not
  * true can't use this generic implementation.
  */
-extern unsigned int ioread8(void __iomem *);
-extern unsigned int ioread16(void __iomem *);
-extern unsigned int ioread16be(void __iomem *);
-extern unsigned int ioread32(void __iomem *);
-extern unsigned int ioread32be(void __iomem *);
+extern unsigned int ioread8(const void __iomem *);
+extern unsigned int ioread16(const void __iomem *);
+extern unsigned int ioread16be(const void __iomem *);
+extern unsigned int ioread32(const void __iomem *);
+extern unsigned int ioread32be(const void __iomem *);
 #ifdef CONFIG_64BIT
-extern u64 ioread64(void __iomem *);
-extern u64 ioread64be(void __iomem *);
+extern u64 ioread64(const void __iomem *);
+extern u64 ioread64be(const void __iomem *);
 #endif
 
 #ifdef readq
@@ -41,10 +41,10 @@ extern u64 ioread64be(void __iomem *);
 #define ioread64_hi_lo ioread64_hi_lo
 #define ioread64be_lo_hi ioread64be_lo_hi
 #define ioread64be_hi_lo ioread64be_hi_lo
-extern u64 ioread64_lo_hi(void __iomem *addr);
-extern u64 ioread64_hi_lo(void __iomem *addr);
-extern u64 ioread64be_lo_hi(void __iomem *addr);
-extern u64 ioread64be_hi_lo(void __iomem *addr);
+extern u64 ioread64_lo_hi(const void __iomem *addr);
+extern u64 ioread64_hi_lo(const void __iomem *addr);
+extern u64 ioread64be_lo_hi(const void __iomem *addr);
+extern u64 ioread64be_hi_lo(const void __iomem *addr);
 #endif
 
 extern void iowrite8(u8, void __iomem *);
diff --git a/include/linux/io-64-nonatomic-hi-lo.h 
b/include/linux/io-64-nonatomic-hi-lo.h
index ae21b72cce85..f32522bb3aa5 100644
--- a/include/linux/io-64-nonatomic-hi-lo.h
+++ b/include/linux/io-64-nonatomic-hi-lo.h
@@ -57,7 +57,7 @@ static inline void hi_lo_writeq_relaxed(__u64 val, volatile 
void __iomem *addr)
 
 #ifndef ioread64_hi_lo
 #define ioread64_hi_lo ioread64_hi_lo
-static inline u64 ioread64_hi_lo(void __iomem *addr)
+static inline u64 ioread64_hi_lo(const void __iomem *addr)
 {
u32 low, high;
 
@@ -79,7 +79,7 @@ static inline void iowrite64_hi_lo(u64 val, void __iomem 
*addr)
 
 #ifndef ioread64be_hi_lo
 #define ioread64be_hi_lo ioread64be_hi_lo
-static inline u64 ioread64be_hi_lo(void __iomem *addr)
+static inline u64 ioread64be_hi_lo(const void __iomem *addr)
 {
u32 low, high;
 
diff --git a/include/linux/io-64-nonatomic-lo-hi.h 
b/include/linux/io-64-nonatomic-lo-hi.h
index faaa842dbdb9..448a21435dba 100644
--- a/include/linux/io-64-nonatomic-lo-hi.h
+++ b/include/linux/io-64-nonatomic-lo-hi.h
@@ -57,7 +57,7 @@ static inline void lo_hi_writeq_relaxed(__u64 val, volatile 
void __iomem *addr)
 
 #ifndef ioread64_lo_hi
 #define ioread64_lo_hi ioread64_lo_hi
-static inline u64 ioread64_lo_hi(void __iomem *addr)
+static inline u64 ioread64_lo_hi(const void __iomem *addr)
 {
u32 low, high;
 
@@ -79,7 +79,7 @@ static inline void iowrite64_lo_hi(u64 val, void __iomem 
*addr)
 
 #ifndef ioread64be_lo_hi
 #define ioread64be_lo_hi ioread64be_lo_hi
-static inline u64 ioread64be_lo_hi(void __iomem *addr)
+static inline u64 ioread64be_lo_hi(const void __iomem *addr)
 {
u32 low, high;
 
diff --git a/lib/iomap.c b/lib/iomap.c
index e909ab71e995..3b10c0ab2cee 100644
--- a/lib/iomap.c
+++ b/lib/iomap.c
@@ -70,27 +70,27 @@ static void bad_io_access(unsigned long port, const char 
*access)
 #define mmio_read64be(addr) swab64(readq(addr))
 #endif
 
-unsigned int ioread8(void __iomem *addr)
+unsigned int ioread8(const void __iomem *addr)
 {
IO_COND(addr, return inb(port), return readb(addr));
return 0xff;
 }
-unsigned int ioread16(void __iomem *addr)
+unsigned int ioread16(const void __iomem *addr)
 {
IO_COND(addr, return inw(port), return readw(addr));
return 0x;
 }
-unsigned int ioread16be(void __iomem *addr)
+unsigned int ioread16be(const void __iomem *addr)
 {
IO_COND(addr, return pio_read16be(port), return mmio_read16be(addr));
return 0x;
 }
-unsigned int ioread32(void __iomem *addr)
+unsigned int ioread32(const void __iomem *addr)
 {
IO_COND(addr, return inl(port), return readl(addr));

[RFT 00/13] iomap: Constify ioreadX() iomem argument

2020-01-07 Thread Krzysztof Kozlowski

Hi,

The ioread8/16/32() and others have inconsistent interface among the
architectures: some taking address as const, some not.

It seems there is nothing really stopping all of them to take
pointer to const.

Patchset was really tested on all affected architectures.
Build testing is in progress - I hope auto-builders will point any issues.


Todo

Convert also string versions (ioread16_rep() etc) if this aproach looks OK.


Merging
===
The first 5 patches - iomap, alpha, sh, parisc and powerpc - should probably go
via one tree, or even squashed into one.

All other can go separately after these get merged.

Best regards,
Krzysztof


Krzysztof Kozlowski (13):
  iomap: Constify ioreadX() iomem argument (as in generic
implementation)
  alpha: Constify ioreadX() iomem argument (as in generic
implementation)
  sh: Constify ioreadX() iomem argument (as in generic implementation)
  parisc: Constify ioreadX() iomem argument (as in generic
implementation)
  powerpc: Constify ioreadX() iomem argument (as in generic
implementation)
  arc: Constify ioreadX() iomem argument (as in generic implementation)
  drm/mgag200: Constify ioreadX() iomem argument (as in generic
implementation)
  drm/nouveau: Constify ioreadX() iomem argument (as in generic
implementation)
  media: fsl-viu: Constify ioreadX() iomem argument (as in generic
implementation)
  net: wireless: ath5k: Constify ioreadX() iomem argument (as in generic
implementation)
  net: wireless: rtl818x: Constify ioreadX() iomem argument (as in
generic implementation)
  ntb: intel: Constify ioreadX() iomem argument (as in generic
implementation)
  virtio: pci: Constify ioreadX() iomem argument (as in generic
implementation)

 arch/alpha/include/asm/core_apecs.h|  6 +--
 arch/alpha/include/asm/core_cia.h  |  6 +--
 arch/alpha/include/asm/core_lca.h  |  6 +--
 arch/alpha/include/asm/core_marvel.h   |  4 +-
 arch/alpha/include/asm/core_mcpcia.h   |  6 +--
 arch/alpha/include/asm/core_t2.h   |  2 +-
 arch/alpha/include/asm/io.h| 12 +++---
 arch/alpha/include/asm/io_trivial.h| 16 
 arch/alpha/include/asm/jensen.h|  2 +-
 arch/alpha/include/asm/machvec.h   |  6 +--
 arch/alpha/kernel/core_marvel.c|  2 +-
 arch/alpha/kernel/io.c |  6 +--
 arch/arc/plat-axs10x/axs10x.c  |  4 +-
 arch/parisc/include/asm/io.h   |  4 +-
 arch/parisc/lib/iomap.c| 48 +++---
 arch/powerpc/kernel/iomap.c| 22 +-
 arch/sh/kernel/iomap.c | 10 ++---
 drivers/gpu/drm/mgag200/mgag200_drv.h  |  4 +-
 drivers/gpu/drm/nouveau/nouveau_bo.c   |  2 +-
 drivers/media/platform/fsl-viu.c   |  2 +-
 drivers/net/wireless/ath/ath5k/ahb.c   | 10 ++---
 .../net/wireless/realtek/rtl818x/rtl8180/rtl8180.h |  6 +--
 drivers/ntb/hw/intel/ntb_hw_gen1.c |  2 +-
 drivers/ntb/hw/intel/ntb_hw_gen3.h |  2 +-
 drivers/ntb/hw/intel/ntb_hw_intel.h|  2 +-
 drivers/virtio/virtio_pci_modern.c |  6 +--
 include/asm-generic/iomap.h| 22 +-
 include/linux/io-64-nonatomic-hi-lo.h  |  4 +-
 include/linux/io-64-nonatomic-lo-hi.h  |  4 +-
 lib/iomap.c| 18 
 30 files changed, 123 insertions(+), 123 deletions(-)

-- 
2.7.4

Re: [PATCH v4 2/2] KVM: PPC: Implement H_SVM_INIT_ABORT hcall

2020-01-07 Thread Ram Pai

On Mon, Jan 06, 2020 at 06:02:37PM -0800, Sukadev Bhattiprolu wrote:
> Ram Pai [linux...@us.ibm.com] wrote:
> >
> > One small comment.. H_STATE is a better return code than H_UNSUPPORTED.
> > 
> 
> Here is the updated patch - we now return H_STATE if the abort call is
> made after the VM has gone secure.
> ---
> >From 73fe1fa5aff2829f2fae6a339169e56dc0bbae06 Mon Sep 17 00:00:00 2001
> From: Sukadev Bhattiprolu 
> Date: Fri, 27 Sep 2019 14:30:36 -0500
> Subject: [PATCH 2/2] KVM: PPC: Implement H_SVM_INIT_ABORT hcall

This patch looks good.

Reviewed-by: Ram Pai 

Thanks,
RP

Re: PPC64: G5 & 4k/64k page size

2020-01-07 Thread Bertrand




On 06/01/2020 19:30, Romain Dolbeau wrote:

Le dim. 5 janv. 2020 à 16:06, Bertrand Dekoninck
 a écrit :

I can now test on powermac 7,3 (with an ATI card)
How can I build a deb package of this kernel ? Or is there a package to 
download somewhere ?


I've tested the 5.5-rc package with the link you gave hereafter. When 
you said "working kernel", did you mean it would not crash ? It doesn't 
for me. I can boot successfully.



The only weirdness I could notice is that my swap space wasn't mounted. 
When I tried  to run swapon, I was given the following (roughly 
translated from french) :


swapon: /dev/sdb5 : pagesize doesn't fit with space space format

  unable to find swap-space signature

So I ran mkswap on the partition, I could then run swapon successfully.

But, when I boot the old kernel ( 5.3), I've got the same error again on 
swap and I have to format the swap space to mount it manually.


There's something wrong on swap page size between those two kernels.


I usually cross-compile on x86-64 from upstream sources. On a Debian
Buster with the powerpc tools installed,
it's just:

#
make ARCH=powerpc CROSS_COMPILE=powerpc64-linux-gnu- oldconfig && nice
-19 make ARCH=powerpc CROSS_COMPILE=powerpc64-linux-gnu- -j56
bindeb-pkg


(alter the -j56 for your own build system). For the dependency, as far
as I remember I only needed "gcc-powerpc64-linux-gnu" and
dependencies. My '.config' is Debian's 5.3 plus default values for
changes - with the exception of 4 KiB pages.I should use the same and the same 
config file also.
I'll try this later. Should I download the 5.5-rc1 kernel source or 
something else ?


I've also uploaded the working kernel with 4 KiB pages DEB here:
, as it might be easier for a quick test.

Cordially,


Cheers,

Bertrand

[Bug 206049] alg: skcipher: p8_aes_xts encryption unexpectedly succeeded on test vector "random: len=0 klen=64"; expected_error=-22, cfg="random: inplace may_sleep use_finup src_divs=[66.99%@+1

2020-01-07 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=206049

--- Comment #4 from Erhard F. (erhar...@mailbox.org) ---
(In reply to Daniel Axtens from comment #2)
> Hi Erhard,
> 
> I'm having a look. Does this reproduce reliably/often? Or was it a one-off?
Hi Daniel,

This shows up every time booting the Talos II. I have yet to try Michaels
patch.

Regards,
Erhard

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH] powerpc: add support for folded p4d page tables

2020-01-07 Thread Christophe Leroy





Le 05/01/2020 à 08:02, Mike Rapoport a écrit :

On Thu, Jan 02, 2020 at 05:42:31PM +0100, Christophe Leroy wrote:

Mike Rapoport  a écrit :


Any updates on this?


Checkpatch reported several points, see
https://patchwork.ozlabs.org/patch/1206344/


Well, for the most part checkpatch is unhappy because I've tried to keep
the changes consistent with the old code. And, there are some lines over 90
characters that do no seem worth breaking.



Yes I see, many places we have stuff like:

pmd = pmd_offset(pud_offset(pgd_offset(mm, start), start), start);

I sent a patch to refactor that. This should help fixing that long line 
issue and reducing the number of lines impacted by your patch.


Christophe

Re: [PATCH v3] powerpc/kernel/sysfs: Add new config option PMU_SYSFS to enable PMU SPRs sysfs file creation

2020-01-07 Thread Michael Ellerman

Kajol Jain  writes:
> diff --git a/arch/powerpc/platforms/Kconfig.cputype 
> b/arch/powerpc/platforms/Kconfig.cputype
> index 12543e53fa96..58c72b9b8902 100644
> --- a/arch/powerpc/platforms/Kconfig.cputype
> +++ b/arch/powerpc/platforms/Kconfig.cputype
> @@ -417,6 +417,12 @@ config PPC_MM_SLICES
>  config PPC_HAVE_PMU_SUPPORT
> bool
>  
> +config PMU_SYSFS
> + bool
> + default n
> + help
> +   This option enables sysfs file creation for PMU SPRs like MMCR* and 
> PMC*.

This isn't user-selectable. It needs a description after `bool` to be
user-selectable.

cheers

[PATCH] powerpc/32: refactor pmd_offset(pud_offset(pgd_offset...

2020-01-07 Thread Christophe Leroy

At several places pmd pointer is retrieved through the same action:

pmd = pmd_offset(pud_offset(pgd_offset(mm, addr), addr), addr);

or

pmd = pmd_offset(pud_offset(pgd_offset_k(addr), addr), addr);

Refactor this by implementing two helpers pmd_ptr() and pmd_ptr_k()

This will help when adding the p4d level.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/pgtable.h| 12 
 arch/powerpc/mm/book3s32/mmu.c|  2 +-
 arch/powerpc/mm/book3s32/tlb.c|  4 ++--
 arch/powerpc/mm/kasan/kasan_init_32.c |  8 
 arch/powerpc/mm/mem.c |  3 +--
 arch/powerpc/mm/nohash/40x.c  |  4 ++--
 arch/powerpc/mm/pgtable_32.c  |  2 +-
 7 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/pgtable.h 
b/arch/powerpc/include/asm/pgtable.h
index 0e4ec8cc37b7..b5e358c0ea7e 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -41,6 +41,18 @@ struct mm_struct;
 
 #ifndef __ASSEMBLY__
 
+#ifdef CONFIG_PPC32
+static inline pmd_t *pmd_ptr(struct mm_struct *mm, unsigned long va)
+{
+   return pmd_offset(pud_offset(pgd_offset(mm, va), va), va);
+}
+
+static inline pmd_t *pmd_ptr_k(unsigned long va)
+{
+   return pmd_offset(pud_offset(pgd_offset_k(va), va), va);
+}
+#endif
+
 #include 
 
 /* Keep these as a macros to avoid include dependency mess */
diff --git a/arch/powerpc/mm/book3s32/mmu.c b/arch/powerpc/mm/book3s32/mmu.c
index 69b2419accef..91553e1ff4b9 100644
--- a/arch/powerpc/mm/book3s32/mmu.c
+++ b/arch/powerpc/mm/book3s32/mmu.c
@@ -312,7 +312,7 @@ void hash_preload(struct mm_struct *mm, unsigned long ea)
 
if (!Hash)
return;
-   pmd = pmd_offset(pud_offset(pgd_offset(mm, ea), ea), ea);
+   pmd = pmd_ptr(mm, ea);
if (!pmd_none(*pmd))
add_hash_page(mm->context.id, ea, pmd_val(*pmd));
 }
diff --git a/arch/powerpc/mm/book3s32/tlb.c b/arch/powerpc/mm/book3s32/tlb.c
index 2fcd321040ff..b08f0ec7f409 100644
--- a/arch/powerpc/mm/book3s32/tlb.c
+++ b/arch/powerpc/mm/book3s32/tlb.c
@@ -87,7 +87,7 @@ static void flush_range(struct mm_struct *mm, unsigned long 
start,
if (start >= end)
return;
end = (end - 1) | ~PAGE_MASK;
-   pmd = pmd_offset(pud_offset(pgd_offset(mm, start), start), start);
+   pmd = pmd_ptr(mm, start);
for (;;) {
pmd_end = ((start + PGDIR_SIZE) & PGDIR_MASK) - 1;
if (pmd_end > end)
@@ -145,7 +145,7 @@ void flush_tlb_page(struct vm_area_struct *vma, unsigned 
long vmaddr)
return;
}
mm = (vmaddr < TASK_SIZE)? vma->vm_mm: _mm;
-   pmd = pmd_offset(pud_offset(pgd_offset(mm, vmaddr), vmaddr), vmaddr);
+   pmd = pmd_ptr(mm, vmaddr);
if (!pmd_none(*pmd))
flush_hash_pages(mm->context.id, vmaddr, pmd_val(*pmd), 1);
 }
diff --git a/arch/powerpc/mm/kasan/kasan_init_32.c 
b/arch/powerpc/mm/kasan/kasan_init_32.c
index 0e6ed4413eea..4b505ff0ff44 100644
--- a/arch/powerpc/mm/kasan/kasan_init_32.c
+++ b/arch/powerpc/mm/kasan/kasan_init_32.c
@@ -36,7 +36,7 @@ static int __ref kasan_init_shadow_page_tables(unsigned long 
k_start, unsigned l
unsigned long k_cur, k_next;
pgprot_t prot = slab_is_available() ? kasan_prot_ro() : PAGE_KERNEL;
 
-   pmd = pmd_offset(pud_offset(pgd_offset_k(k_start), k_start), k_start);
+   pmd = pmd_ptr_k(k_start);
 
for (k_cur = k_start; k_cur != k_end; k_cur = k_next, pmd++) {
pte_t *new;
@@ -94,7 +94,7 @@ static int __ref kasan_init_region(void *start, size_t size)
block = memblock_alloc(k_end - k_start, PAGE_SIZE);
 
for (k_cur = k_start & PAGE_MASK; k_cur < k_end; k_cur += PAGE_SIZE) {
-   pmd_t *pmd = pmd_offset(pud_offset(pgd_offset_k(k_cur), k_cur), 
k_cur);
+   pmd_t *pmd = pmd_ptr_k(k_cur);
void *va = block ? block + k_cur - k_start : 
kasan_get_one_page();
pte_t pte = pfn_pte(PHYS_PFN(__pa(va)), PAGE_KERNEL);
 
@@ -118,7 +118,7 @@ static void __init kasan_remap_early_shadow_ro(void)
kasan_populate_pte(kasan_early_shadow_pte, prot);
 
for (k_cur = k_start & PAGE_MASK; k_cur < k_end; k_cur += PAGE_SIZE) {
-   pmd_t *pmd = pmd_offset(pud_offset(pgd_offset_k(k_cur), k_cur), 
k_cur);
+   pmd_t *pmd = pmd_ptr_k(k_cur);
pte_t *ptep = pte_offset_kernel(pmd, k_cur);
 
if ((pte_val(*ptep) & PTE_RPN_MASK) != pa)
@@ -205,7 +205,7 @@ void __init kasan_early_init(void)
unsigned long addr = KASAN_SHADOW_START;
unsigned long end = KASAN_SHADOW_END;
unsigned long next;
-   pmd_t *pmd = pmd_offset(pud_offset(pgd_offset_k(addr), addr), addr);
+   pmd_t *pmd = pmd_ptr_k(addr);
 
BUILD_BUG_ON(KASAN_SHADOW_START & ~PGDIR_MASK);
 
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index

[Bug 206049] alg: skcipher: p8_aes_xts encryption unexpectedly succeeded on test vector "random: len=0 klen=64"; expected_error=-22, cfg="random: inplace may_sleep use_finup src_divs=[66.99%@+1

2020-01-07 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=206049

Michael Ellerman (mich...@ellerman.id.au) changed:

   What|Removed |Added

 Status|ASSIGNED|NEEDINFO

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 206049] alg: skcipher: p8_aes_xts encryption unexpectedly succeeded on test vector "random: len=0 klen=64"; expected_error=-22, cfg="random: inplace may_sleep use_finup src_divs=[66.99%@+1

2020-01-07 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=206049

Michael Ellerman (mich...@ellerman.id.au) changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
 CC||mich...@ellerman.id.au

--- Comment #3 from Michael Ellerman (mich...@ellerman.id.au) ---
Looks like other implementations check the size, can you try this:

diff --git a/drivers/crypto/vmx/aes_xts.c b/drivers/crypto/vmx/aes_xts.c
index d59e736882f6..9fee1b1532a4 100644
--- a/drivers/crypto/vmx/aes_xts.c
+++ b/drivers/crypto/vmx/aes_xts.c
@@ -84,6 +84,9 @@ static int p8_aes_xts_crypt(struct skcipher_request *req, int
enc)
u8 tweak[AES_BLOCK_SIZE];
int ret;

+   if (req->cryptlen < AES_BLOCK_SIZE)
+   return -EINVAL;
+
if (!crypto_simd_usable() || (req->cryptlen % XTS_BLOCK_SIZE) != 0) {
struct skcipher_request *subreq = skcipher_request_ctx(req);

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH v6 3/5] powerpc/mm/ptdump: debugfs handler for W+X checks at runtime

2020-01-07 Thread Michael Ellerman

Christophe Leroy  writes:
> Russell Currey  a écrit :
>
>> Very rudimentary, just
>>
>>  echo 1 > [debugfs]/check_wx_pages
>>
>> and check the kernel log.  Useful for testing strict module RWX.
>
> For testing strict module RWX you could instead implement  
> module_arch_freeing_init() and call  ptdump_check_wx() from there.

That could get expensive on large systems, not sure if we want it
enabled by default?

cheers


>> diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
>> index 4e1d39847462..7c14c9728bc0 100644
>> --- a/arch/powerpc/Kconfig.debug
>> +++ b/arch/powerpc/Kconfig.debug
>> @@ -370,7 +370,7 @@ config PPC_PTDUMP
>>If you are unsure, say N.
>>
>>  config PPC_DEBUG_WX
>> -bool "Warn on W+X mappings at boot"
>> +bool "Warn on W+X mappings at boot & enable manual checks at runtime"
>>  depends on PPC_PTDUMP
>>  help
>>Generate a warning if any W+X mappings are found at boot.
>> @@ -384,7 +384,9 @@ config PPC_DEBUG_WX
>>of other unfixed kernel bugs easier.
>>
>>There is no runtime or memory usage effect of this option
>> -  once the kernel has booted up - it's a one time check.
>> +  once the kernel has booted up, it only automatically checks once.
>> +
>> +  Enables the "check_wx_pages" debugfs entry for checking at runtime.
>>
>>If in doubt, say "Y".
>>
>> diff --git a/arch/powerpc/mm/ptdump/ptdump.c  
>> b/arch/powerpc/mm/ptdump/ptdump.c
>> index 2f9ddc29c535..b6cba29ae4a0 100644
>> --- a/arch/powerpc/mm/ptdump/ptdump.c
>> +++ b/arch/powerpc/mm/ptdump/ptdump.c
>> @@ -4,7 +4,7 @@
>>   *
>>   * This traverses the kernel pagetables and dumps the
>>   * information about the used sections of memory to
>> - * /sys/kernel/debug/kernel_pagetables.
>> + * /sys/kernel/debug/kernel_page_tables.
>>   *
>>   * Derived from the arm64 implementation:
>>   * Copyright (c) 2014, The Linux Foundation, Laura Abbott.
>> @@ -409,6 +409,25 @@ void ptdump_check_wx(void)
>>  else
>>  pr_info("Checked W+X mappings: passed, no W+X pages found\n");
>>  }
>> +
>> +static int check_wx_debugfs_set(void *data, u64 val)
>> +{
>> +if (val != 1ULL)
>> +return -EINVAL;
>> +
>> +ptdump_check_wx();
>> +
>> +return 0;
>> +}
>> +
>> +DEFINE_SIMPLE_ATTRIBUTE(check_wx_fops, NULL, check_wx_debugfs_set,  
>> "%llu\n");
>> +
>> +static int ptdump_check_wx_init(void)
>> +{
>> +return debugfs_create_file("check_wx_pages", 0200, NULL,
>> +   NULL, _wx_fops) ? 0 : -ENOMEM;
>> +}
>> +device_initcall(ptdump_check_wx_init);
>>  #endif
>>
>>  static int ptdump_init(void)
>> --
>> 2.24.1

Re: PPC64: G5 & 4k/64k page size (was: Re: Call for report - G5/PPC970 status)

2020-01-07 Thread Michel Dänzer

On 2020-01-06 8:11 p.m., Romain Dolbeau wrote:
> 
> Unfortunately I don't have a PCIe OpenFirmware ATI card to test the
> theory further.

FWIW, a non-OF Radeon >= R(V)5xx card should work in Linux (though
obviously won't light up in OF itself or MacOS).

-- 
Earthling Michel Dänzer   |   https://redhat.com
Libre software enthusiast | Mesa and X developer

Re: [LKP] Re: [mm/debug] 87c4696d57: kernel_BUG_at_include/linux/mm.h

2020-01-07 Thread Rong Chen




On 1/7/20 2:46 PM, Anshuman Khandual wrote:

On 12/27/2019 07:52 PM, kernel test robot wrote:

[9.781974] kernel BUG at include/linux/mm.h:592!
[9.782810] invalid opcode:  [#1] PTI
[9.783443] CPU: 0 PID: 1 Comm: swapper Not tainted 
5.5.0-rc3-1-g87c4696d57b5e #1
[9.784528] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1 04/01/2014
[9.785756] EIP: __free_pages+0x14/0x40
[9.786442] Code: 0c 9c 5e fa 89 d8 e8 5b f3 ff ff 56 9d 5b 5e 5d c3 8d 74 26 00 
90 8b 48 1c 55 89 e5 85 c9 75 16 ba b4 b6 84 d6 e8 ac 49 fe ff <0f> 0b 8d b4 26 
00 00 00 00 8d 76 00 ff 48 1c 75 10 85 d2 75 07 e8
[9.789697] EAX: d68761f7 EBX: ea52f000 ECX: ea4f8520 EDX: d684b6b4
[9.790850] ESI:  EDI: ef45e000 EBP: ea501f08 ESP: ea501f08
[9.791879] DS: 007b ES: 007b FS:  GS:  SS: 0068 EFLAGS: 00010286
[9.792783] CR0: 80050033 CR2:  CR3: 16d0 CR4: 000406b0
[9.792783] Call Trace:
[9.792783]  free_pages+0x3c/0x50
[9.792783]  pgd_free+0x5a/0x170
[9.792783]  __mmdrop+0x42/0xe0
[9.792783]  debug_vm_pgtable+0x54f/0x567
[9.792783]  kernel_init_freeable+0x90/0x1e3
[9.792783]  ? rest_init+0xf0/0xf0
[9.792783]  kernel_init+0x8/0xf0
[9.792783]  ret_from_fork+0x19/0x24
[9.792783] Modules linked in:
[9.792803] ---[ end trace 91b7335adcf0b656 ]---


To reproduce:

 # build kernel
cd linux
cp config-5.5.0-rc3-1-g87c4696d57b5e .config
make HOSTCC=gcc-7 CC=gcc-7 ARCH=i386 olddefconfig prepare 
modules_prepare bzImage

 git clone https://github.com/intel/lkp-tests.git
 cd lkp-tests
 bin/lkp qemu -k  job-script # job-script is attached in this 
email

Hello,

As the failure might be happening during boot when the test executes,
do we really need to run these LKP based QEMU environment in order to
reproduce the problem ? Could not this be recreated on a standalone
system.

- Anshuman


Hi Anshuman,

Thanks for your advice, we provided another reproduce method which is in 
the end of dmesg.xz file attached in original report:

but the initrd file need to be supplied.

qemu-img create -f qcow2 disk-vm-snb-89942dcb126f-0 256G
qemu-img create -f qcow2 disk-vm-snb-89942dcb126f-1 256G
qemu-img create -f qcow2 disk-vm-snb-89942dcb126f-2 256G
qemu-img create -f qcow2 disk-vm-snb-89942dcb126f-3 256G
qemu-img create -f qcow2 disk-vm-snb-89942dcb126f-4 256G
qemu-img create -f qcow2 disk-vm-snb-89942dcb126f-5 256G
qemu-img create -f qcow2 disk-vm-snb-89942dcb126f-6 256G

kvm=(
›   qemu-system-x86_64
›   -enable-kvm
›   -cpu SandyBridge
›   -kernel $kernel
›   -initrd initrd-vm-snb-89942dcb126f
›   -m 8192
›   -smp 2
›   -device e1000,netdev=net0
›   -netdev user,id=net0,hostfwd=tcp::32032-:22
›   -boot order=nc
›   -no-reboot
›   -watchdog i6300esb
›   -watchdog-action debug
›   -rtc base=localtime
›   -drive file=disk-vm-snb-89942dcb126f-0,media=disk,if=virtio
›   -drive file=disk-vm-snb-89942dcb126f-1,media=disk,if=virtio
›   -drive file=disk-vm-snb-89942dcb126f-2,media=disk,if=virtio
›   -drive file=disk-vm-snb-89942dcb126f-3,media=disk,if=virtio
›   -drive file=disk-vm-snb-89942dcb126f-4,media=disk,if=virtio
›   -drive file=disk-vm-snb-89942dcb126f-5,media=disk,if=virtio
›   -drive file=disk-vm-snb-89942dcb126f-6,media=disk,if=virtio
›   -serial stdio
›   -display none
›   -monitor null
)

append=(
›   ip=vm-snb-89942dcb126f::dhcp
›   root=/dev/ram0
›   user=lkp
›   job=/job-script
› ARCH=i386
›   kconfig=i386-randconfig-h001-20191225
›   branch=linux-devel/devel-hourly-2019122518
›   commit=87c4696d57b5e380bc2720fdff0772b042462b7d
› 
BOOT_IMAGE=/pkg/linux/i386-randconfig-h001-20191225/gcc-7/87c4696d57b5e380bc2720fdff0772b042462b7d/vmlinuz-5.5.0-rc3-1-g87c4696d57b5e

›   max_uptime=1500
› 
RESULT_ROOT=/result/trinity/300s/vm-snb/yocto-minimal-i386-2019-05-20.cgz/i386-randconfig-h001-20191225/gcc-7/87c4696d57b5e380bc2720fdff0772b042462b7d/8

›   result_service=tmpfs
›   debug
›   apic=debug
›   sysrq_always_enabled
›   rcupdate.rcu_cpu_stall_timeout=100
›   net.ifnames=0
›   printk.devkmsg=on
›   panic=-1
›   softlockup_panic=1
›   nmi_watchdog=panic
›   oops=panic
›   load_ramdisk=2
›   prompt_ramdisk=0
›   drbd.minor_count=8
›   systemd.log_level=err
›   ignore_loglevel
›   console=tty0
›   earlyprintk=ttyS0,115200
›   console=ttyS0,115200
›   vga=normal
›   rw
›   rcuperf.shutdown=0
›   watchdog_thresh=60
)

"${kvm[@]}" -append "${append[*]}"


We will definitely give serious thought to your suggestion.

Best Regards,
Rong Chen

[PATCH] powerpc/32: don't restore r0, r6-r8 on exception entry path after trace_hardirqs_off()

2020-01-07 Thread Christophe Leroy

Since commit b86fb88855ea ("powerpc/32: implement fast entry for
syscalls on non BOOKE") and commit 1a4b739bbb4f ("powerpc/32:
implement fast entry for syscalls on BOOKE"), syscalls don't
use the exception entry path anymore. It is therefore pointless
to restore r0 and r6-r8 after calling trace_hardirqs_off().

In the meantime, drop the '2:' label which is unused and misleading.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/entry_32.S | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/entry_32.S b/arch/powerpc/kernel/entry_32.S
index 4a7cd22a8aaf..748a13788b9b 100644
--- a/arch/powerpc/kernel/entry_32.S
+++ b/arch/powerpc/kernel/entry_32.S
@@ -242,9 +242,8 @@ reenable_mmu:
 * r3 can be different from GPR3(r1) at this point, r9 and r11
 * contains the old MSR and handler address respectively,
 * r4 & r5 can contain page fault arguments that need to be passed
-* along as well. r12, CCR, CTR, XER etc... are left clobbered as
-* they aren't useful past this point (aren't syscall arguments),
-* the rest is restored from the exception frame.
+* along as well. r0, r6-r8, r12, CCR, CTR, XER etc... are left
+* clobbered as they aren't useful past this point.
 */
 
stwur1,-32(r1)
@@ -258,16 +257,12 @@ reenable_mmu:
 * lockdep
 */
 1: bl  trace_hardirqs_off
-2: lwz r5,24(r1)
+   lwz r5,24(r1)
lwz r4,20(r1)
lwz r3,16(r1)
lwz r11,12(r1)
lwz r9,8(r1)
addir1,r1,32
-   lwz r0,GPR0(r1)
-   lwz r6,GPR6(r1)
-   lwz r7,GPR7(r1)
-   lwz r8,GPR8(r1)
mtctr   r11
mtlrr9
bctr/* jump to handler */
-- 
2.13.3

Re: [PATCH v3 02/22] compat: provide compat_ptr() on all architectures

2020-01-07 Thread H. Peter Anvin

,oprofile-l...@lists.sf.net,linux-s390 
,sparclinux 
From: h...@zytor.com
Message-ID: <41625f06-d755-4c82-86df-a9415feee...@zytor.com>

On January 7, 2020 12:08:31 AM PST, Arnd Bergmann  wrote:
>On Tue, Jan 7, 2020 at 3:05 AM Michael Ellerman 
>wrote:
>> Arnd Bergmann  writes:
>> > +
>> > +static inline compat_uptr_t ptr_to_compat(void __user *uptr)
>> > +{
>> > + return (u32)(unsigned long)uptr;
>> > +}
>>
>> Is there a reason we cast to u32 directly instead of using
>compat_uptr_t?
>
>Probably Al found this to be more explicit at the time when he
>introduced
>it on all the architectures in 2005. I just moved it here and kept the
>definition.
>
>   Arnd

Did compat_uptr_t exist back then?
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: [PATCH v3 02/22] compat: provide compat_ptr() on all architectures

2020-01-07 Thread Arnd Bergmann

On Tue, Jan 7, 2020 at 3:05 AM Michael Ellerman  wrote:
> Arnd Bergmann  writes:
> > +
> > +static inline compat_uptr_t ptr_to_compat(void __user *uptr)
> > +{
> > + return (u32)(unsigned long)uptr;
> > +}
>
> Is there a reason we cast to u32 directly instead of using compat_uptr_t?

Probably Al found this to be more explicit at the time when he introduced
it on all the architectures in 2005. I just moved it here and kept the
definition.

   Arnd

80 matches

Mail list logo