[Bug 11143] [PATCH]unconditional linker option arch/powerpc/lib/crtsavres.o causes external module buildfailure

2021-06-18 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=11143

Christophe Leroy (christophe.le...@csgroup.eu) changed:

   What|Removed |Added

 CC||christophe.le...@csgroup.eu

--- Comment #19 from Christophe Leroy (christophe.le...@csgroup.eu) ---
Seems like commit efe0160cfd40a99c052a00e174787c1f4158a9cd removes it for
binutils >= 2.25

Can you check it works for you now ?

I so, maybe the simplest solution is to wait until binutils minimum version is
at least 2.25. For the time being the minimum is 2.23

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 209277] powerpc: obsolete driver: Marvell MV64X60 MPSC

2021-06-18 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=209277

--- Comment #4 from Christophe Leroy (christophe.le...@csgroup.eu) ---
The watchdog patch is flagged 'accepted' in patchwork, should go into 5.14

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 209277] powerpc: obsolete driver: Marvell MV64X60 MPSC

2021-06-18 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=209277

Christophe Leroy (christophe.le...@csgroup.eu) changed:

   What|Removed |Added

 CC||christophe.le...@csgroup.eu

--- Comment #3 from Christophe Leroy (christophe.le...@csgroup.eu) ---
https://github.com/linuxppc/linux/commit/a329ddd472fa2af0c19a73b8658898ae7fd658ad
https://github.com/linuxppc/linux/commit/600cc3c9c62defd920da07bc585eb739247bb732

https://patchwork.kernel.org/project/linux-watchdog/patch/9c2952bcfaec3b1789909eaa36bbce2afbfab7ab.1616085654.git.christophe.le...@csgroup.eu/

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 205099] KASAN hit at raid6_pq: BUG: Unable to handle kernel data access at 0x00f0fd0d

2021-06-18 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=205099

--- Comment #32 from Christophe Leroy (christophe.le...@csgroup.eu) ---
Is this problem still there with 5.13 ?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH 8/8] membarrier: Rewrite sync_core_before_usermode() and improve documentation

2021-06-18 Thread Nicholas Piggin
Excerpts from Mathieu Desnoyers's message of June 19, 2021 6:09 am:
> - On Jun 18, 2021, at 3:58 PM, Andy Lutomirski l...@kernel.org wrote:
> 
>> On Fri, Jun 18, 2021, at 9:31 AM, Mathieu Desnoyers wrote:
>>> - On Jun 17, 2021, at 8:12 PM, Andy Lutomirski l...@kernel.org wrote:
>>> 
>>> > On 6/17/21 7:47 AM, Mathieu Desnoyers wrote:
>>> > 
>>> >> Please change back this #ifndef / #else / #endif within function for
>>> >> 
>>> >> if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE)) {
>>> >>   ...
>>> >> } else {
>>> >>   ...
>>> >> }
>>> >> 
>>> >> I don't think mixing up preprocessor and code logic makes it more 
>>> >> readable.
>>> > 
>>> > I agree, but I don't know how to make the result work well.
>>> > membarrier_sync_core_before_usermode() isn't defined in the !IS_ENABLED
>>> > case, so either I need to fake up a definition or use #ifdef.
>>> > 
>>> > If I faked up a definition, I would want to assert, at build time, that
>>> > it isn't called.  I don't think we can do:
>>> > 
>>> > static void membarrier_sync_core_before_usermode()
>>> > {
>>> >BUILD_BUG_IF_REACHABLE();
>>> > }
>>> 
>>> Let's look at the context here:
>>> 
>>> static void ipi_sync_core(void *info)
>>> {
>>> []
>>> membarrier_sync_core_before_usermode()
>>> }
>>> 
>>> ^ this can be within #ifdef / #endif
>>> 
>>> static int membarrier_private_expedited(int flags, int cpu_id)
>>> [...]
>>>if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE))
>>> return -EINVAL;
>>> if (!(atomic_read(&mm->membarrier_state) &
>>>   MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY))
>>> return -EPERM;
>>> ipi_func = ipi_sync_core;
>>> 
>>> All we need to make the line above work is to define an empty ipi_sync_core
>>> function in the #else case after the ipi_sync_core() function definition.
>>> 
>>> Or am I missing your point ?
>> 
>> Maybe?
>> 
>> My objection is that an empty ipi_sync_core is a lie — it doesn’t sync the 
>> core.
>> I would be fine with that if I could have the compiler statically verify that
>> it’s not called, but I’m uncomfortable having it there if the implementation 
>> is
>> actively incorrect.
> 
> I see. Another approach would be to implement a "setter" function to populate
> "ipi_func". That setter function would return -EINVAL in its #ifndef 
> CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE
> implementation.

I still don't get the problem with my suggestion. Sure the 
ipi is a "lie", but it doesn't get used. That's how a lot of
ifdef folding works out. E.g.,

diff --git a/kernel/sched/membarrier.c b/kernel/sched/membarrier.c
index b5add64d9698..54cb32d064af 100644
--- a/kernel/sched/membarrier.c
+++ b/kernel/sched/membarrier.c
@@ -5,6 +5,15 @@
  * membarrier system call
  */
 #include "sched.h"
+#ifdef CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE
+#include 
+#else
+static inline void membarrier_sync_core_before_usermode(void)
+{
+   compiletime_assert(0, "architecture does not implement 
membarrier_sync_core_before_usermode");
+}
+
+#endif
 
 /*
  * For documentation purposes, here are some membarrier ordering


Re: [PATCH v13 00/12] Restricted DMA

2021-06-18 Thread Claire Chang
v14: https://lore.kernel.org/patchwork/cover/1448954/


[PATCH v14 12/12] of: Add plumbing for restricted DMA pool

2021-06-18 Thread Claire Chang
If a device is not behind an IOMMU, we look up the device node and set
up the restricted DMA when the restricted-dma-pool is presented.

Signed-off-by: Claire Chang 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
---
 drivers/of/address.c| 33 +
 drivers/of/device.c |  3 +++
 drivers/of/of_private.h |  6 ++
 3 files changed, 42 insertions(+)

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 73ddf2540f3f..cdf700fba5c4 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1022,6 +1023,38 @@ int of_dma_get_range(struct device_node *np, const 
struct bus_dma_region **map)
of_node_put(node);
return ret;
 }
+
+int of_dma_set_restricted_buffer(struct device *dev, struct device_node *np)
+{
+   struct device_node *node, *of_node = dev->of_node;
+   int count, i;
+
+   count = of_property_count_elems_of_size(of_node, "memory-region",
+   sizeof(u32));
+   /*
+* If dev->of_node doesn't exist or doesn't contain memory-region, try
+* the OF node having DMA configuration.
+*/
+   if (count <= 0) {
+   of_node = np;
+   count = of_property_count_elems_of_size(
+   of_node, "memory-region", sizeof(u32));
+   }
+
+   for (i = 0; i < count; i++) {
+   node = of_parse_phandle(of_node, "memory-region", i);
+   /*
+* There might be multiple memory regions, but only one
+* restricted-dma-pool region is allowed.
+*/
+   if (of_device_is_compatible(node, "restricted-dma-pool") &&
+   of_device_is_available(node))
+   return of_reserved_mem_device_init_by_idx(dev, of_node,
+ i);
+   }
+
+   return 0;
+}
 #endif /* CONFIG_HAS_DMA */
 
 /**
diff --git a/drivers/of/device.c b/drivers/of/device.c
index 6cb86de404f1..e68316836a7a 100644
--- a/drivers/of/device.c
+++ b/drivers/of/device.c
@@ -165,6 +165,9 @@ int of_dma_configure_id(struct device *dev, struct 
device_node *np,
 
arch_setup_dma_ops(dev, dma_start, size, iommu, coherent);
 
+   if (!iommu)
+   return of_dma_set_restricted_buffer(dev, np);
+
return 0;
 }
 EXPORT_SYMBOL_GPL(of_dma_configure_id);
diff --git a/drivers/of/of_private.h b/drivers/of/of_private.h
index d9e6a324de0a..25cebbed5f02 100644
--- a/drivers/of/of_private.h
+++ b/drivers/of/of_private.h
@@ -161,12 +161,18 @@ struct bus_dma_region;
 #if defined(CONFIG_OF_ADDRESS) && defined(CONFIG_HAS_DMA)
 int of_dma_get_range(struct device_node *np,
const struct bus_dma_region **map);
+int of_dma_set_restricted_buffer(struct device *dev, struct device_node *np);
 #else
 static inline int of_dma_get_range(struct device_node *np,
const struct bus_dma_region **map)
 {
return -ENODEV;
 }
+static inline int of_dma_set_restricted_buffer(struct device *dev,
+  struct device_node *np)
+{
+   return -ENODEV;
+}
 #endif
 
 #endif /* _LINUX_OF_PRIVATE_H */
-- 
2.32.0.288.g62a8d224e6-goog



[PATCH v14 11/12] dt-bindings: of: Add restricted DMA pool

2021-06-18 Thread Claire Chang
Introduce the new compatible string, restricted-dma-pool, for restricted
DMA. One can specify the address and length of the restricted DMA memory
region by restricted-dma-pool in the reserved-memory node.

Signed-off-by: Claire Chang 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
---
 .../reserved-memory/reserved-memory.txt   | 36 +--
 1 file changed, 33 insertions(+), 3 deletions(-)

diff --git 
a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt 
b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
index e8d3096d922c..39b5f4c5a511 100644
--- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
+++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
@@ -51,6 +51,23 @@ compatible (optional) - standard definition
   used as a shared pool of DMA buffers for a set of devices. It can
   be used by an operating system to instantiate the necessary pool
   management subsystem if necessary.
+- restricted-dma-pool: This indicates a region of memory meant to be
+  used as a pool of restricted DMA buffers for a set of devices. The
+  memory region would be the only region accessible to those devices.
+  When using this, the no-map and reusable properties must not be set,
+  so the operating system can create a virtual mapping that will be 
used
+  for synchronization. The main purpose for restricted DMA is to
+  mitigate the lack of DMA access control on systems without an IOMMU,
+  which could result in the DMA accessing the system memory at
+  unexpected times and/or unexpected addresses, possibly leading to 
data
+  leakage or corruption. The feature on its own provides a basic level
+  of protection against the DMA overwriting buffer contents at
+  unexpected times. However, to protect against general data leakage 
and
+  system memory corruption, the system needs to provide way to lock 
down
+  the memory access, e.g., MPU. Note that since coherent allocation
+  needs remapping, one must set up another device coherent pool by
+  shared-dma-pool and use dma_alloc_from_dev_coherent instead for 
atomic
+  coherent allocation.
 - vendor specific string in the form ,[-]
 no-map (optional) - empty property
 - Indicates the operating system must not create a virtual mapping
@@ -85,10 +102,11 @@ memory-region-names (optional) - a list of names, one for 
each corresponding
 
 Example
 ---
-This example defines 3 contiguous regions are defined for Linux kernel:
+This example defines 4 contiguous regions for Linux kernel:
 one default of all device drivers (named linux,cma@7200 and 64MiB in size),
-one dedicated to the framebuffer device (named framebuffer@7800, 8MiB), and
-one for multimedia processing (named multimedia-memory@7700, 64MiB).
+one dedicated to the framebuffer device (named framebuffer@7800, 8MiB),
+one for multimedia processing (named multimedia-memory@7700, 64MiB), and
+one for restricted dma pool (named restricted_dma_reserved@0x5000, 64MiB).
 
 / {
#address-cells = <1>;
@@ -120,6 +138,11 @@ one for multimedia processing (named 
multimedia-memory@7700, 64MiB).
compatible = "acme,multimedia-memory";
reg = <0x7700 0x400>;
};
+
+   restricted_dma_reserved: restricted_dma_reserved {
+   compatible = "restricted-dma-pool";
+   reg = <0x5000 0x400>;
+   };
};
 
/* ... */
@@ -138,4 +161,11 @@ one for multimedia processing (named 
multimedia-memory@7700, 64MiB).
memory-region = <&multimedia_reserved>;
/* ... */
};
+
+   pcie_device: pcie_device@0,0 {
+   reg = <0x8301 0x0 0x 0x0 0x0010
+  0x8301 0x0 0x0010 0x0 0x0010>;
+   memory-region = <&restricted_dma_reserved>;
+   /* ... */
+   };
 };
-- 
2.32.0.288.g62a8d224e6-goog



[PATCH v14 10/12] swiotlb: Add restricted DMA pool initialization

2021-06-18 Thread Claire Chang
Add the initialization function to create restricted DMA pools from
matching reserved-memory nodes.

Regardless of swiotlb setting, the restricted DMA pool is preferred if
available.

The restricted DMA pools provide a basic level of protection against the
DMA overwriting buffer contents at unexpected times. However, to protect
against general data leakage and system memory corruption, the system
needs to provide a way to lock down the memory access, e.g., MPU.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
---
 include/linux/swiotlb.h |  3 +-
 kernel/dma/Kconfig  | 14 
 kernel/dma/swiotlb.c| 76 +
 3 files changed, 92 insertions(+), 1 deletion(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index a73fad460162..175b6c113ed8 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -73,7 +73,8 @@ extern enum swiotlb_force swiotlb_force;
  * range check to see if the memory was in fact allocated by this
  * API.
  * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
- * @end. This is command line adjustable via setup_io_tlb_npages.
+ * @end. For default swiotlb, this is command line adjustable via
+ * setup_io_tlb_npages.
  * @used:  The number of used IO TLB block.
  * @list:  The free list describing the number of free entries available
  * from each index.
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 77b405508743..3e961dc39634 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -80,6 +80,20 @@ config SWIOTLB
bool
select NEED_DMA_MAP_STATE
 
+config DMA_RESTRICTED_POOL
+   bool "DMA Restricted Pool"
+   depends on OF && OF_RESERVED_MEM
+   select SWIOTLB
+   help
+ This enables support for restricted DMA pools which provide a level of
+ DMA memory protection on systems with limited hardware protection
+ capabilities, such as those lacking an IOMMU.
+
+ For more information see
+ 

+ and .
+ If unsure, say "n".
+
 #
 # Should be selected if we can mmap non-coherent mappings to userspace.
 # The only thing that is really required is a way to set an uncached bit
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 273b21090ee8..1aef294c82b5 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -39,6 +39,13 @@
 #ifdef CONFIG_DEBUG_FS
 #include 
 #endif
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+#include 
+#include 
+#include 
+#include 
+#include 
+#endif
 
 #include 
 #include 
@@ -736,4 +743,73 @@ bool swiotlb_free(struct device *dev, struct page *page, 
size_t size)
return true;
 }
 
+static int rmem_swiotlb_device_init(struct reserved_mem *rmem,
+   struct device *dev)
+{
+   struct io_tlb_mem *mem = rmem->priv;
+   unsigned long nslabs = rmem->size >> IO_TLB_SHIFT;
+
+   /*
+* Since multiple devices can share the same pool, the private data,
+* io_tlb_mem struct, will be initialized by the first device attached
+* to it.
+*/
+   if (!mem) {
+   mem = kzalloc(struct_size(mem, slots, nslabs), GFP_KERNEL);
+   if (!mem)
+   return -ENOMEM;
+
+   set_memory_decrypted((unsigned long)phys_to_virt(rmem->base),
+rmem->size >> PAGE_SHIFT);
+   swiotlb_init_io_tlb_mem(mem, rmem->base, nslabs, false);
+   mem->force_bounce = true;
+   mem->for_alloc = true;
+
+   rmem->priv = mem;
+
+   if (IS_ENABLED(CONFIG_DEBUG_FS)) {
+   mem->debugfs =
+   debugfs_create_dir(rmem->name, debugfs_dir);
+   swiotlb_create_debugfs_files(mem);
+   }
+   }
+
+   dev->dma_io_tlb_mem = mem;
+
+   return 0;
+}
+
+static void rmem_swiotlb_device_release(struct reserved_mem *rmem,
+   struct device *dev)
+{
+   dev->dma_io_tlb_mem = io_tlb_default_mem;
+}
+
+static const struct reserved_mem_ops rmem_swiotlb_ops = {
+   .device_init = rmem_swiotlb_device_init,
+   .device_release = rmem_swiotlb_device_release,
+};
+
+static int __init rmem_swiotlb_setup(struct reserved_mem *rmem)
+{
+   unsigned long node = rmem->fdt_node;
+
+   if (of_get_flat_dt_prop(node, "reusable", NULL) ||
+   of_get_flat_dt_prop(node, "linux,cma-default", NULL) ||
+   of_get_flat_dt_prop(node, "linux,dma-default", NULL) ||
+   of_get_flat_dt_prop(node, "no-map", NULL))
+   return -EINVAL;
+
+   if (PageHighMem(pfn_to_page(PHYS_PFN(rmem->base {
+   pr_err("Restricted DMA pool must be accessible within the 
linear mapping.");
+   return -EINVAL

[PATCH v14 09/12] swiotlb: Add restricted DMA alloc/free support

2021-06-18 Thread Claire Chang
Add the functions, swiotlb_{alloc,free} and is_swiotlb_for_alloc to
support the memory allocation from restricted DMA pool.

The restricted DMA pool is preferred if available.

Note that since coherent allocation needs remapping, one must set up
another device coherent pool by shared-dma-pool and use
dma_alloc_from_dev_coherent instead for atomic coherent allocation.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
Acked-by: Stefano Stabellini 
---
 include/linux/swiotlb.h | 26 ++
 kernel/dma/direct.c | 49 +++--
 kernel/dma/swiotlb.c| 38 ++--
 3 files changed, 99 insertions(+), 14 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 8d8855c77d9a..a73fad460162 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -85,6 +85,7 @@ extern enum swiotlb_force swiotlb_force;
  * @debugfs:   The dentry to debugfs.
  * @late_alloc:%true if allocated using the page allocator
  * @force_bounce: %true if swiotlb bouncing is forced
+ * @for_alloc:  %true if the pool is used for memory allocation
  */
 struct io_tlb_mem {
phys_addr_t start;
@@ -96,6 +97,7 @@ struct io_tlb_mem {
struct dentry *debugfs;
bool late_alloc;
bool force_bounce;
+   bool for_alloc;
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
@@ -156,4 +158,28 @@ static inline void swiotlb_adjust_size(unsigned long size)
 extern void swiotlb_print_info(void);
 extern void swiotlb_set_max_segment(unsigned int);
 
+#ifdef CONFIG_DMA_RESTRICTED_POOL
+struct page *swiotlb_alloc(struct device *dev, size_t size);
+bool swiotlb_free(struct device *dev, struct page *page, size_t size);
+
+static inline bool is_swiotlb_for_alloc(struct device *dev)
+{
+   return dev->dma_io_tlb_mem->for_alloc;
+}
+#else
+static inline struct page *swiotlb_alloc(struct device *dev, size_t size)
+{
+   return NULL;
+}
+static inline bool swiotlb_free(struct device *dev, struct page *page,
+   size_t size)
+{
+   return false;
+}
+static inline bool is_swiotlb_for_alloc(struct device *dev)
+{
+   return false;
+}
+#endif /* CONFIG_DMA_RESTRICTED_POOL */
+
 #endif /* __LINUX_SWIOTLB_H */
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index a92465b4eb12..2de33e5d302b 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -75,6 +75,15 @@ static bool dma_coherent_ok(struct device *dev, phys_addr_t 
phys, size_t size)
min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit);
 }
 
+static void __dma_direct_free_pages(struct device *dev, struct page *page,
+   size_t size)
+{
+   if (IS_ENABLED(CONFIG_DMA_RESTRICTED_POOL) &&
+   swiotlb_free(dev, page, size))
+   return;
+   dma_free_contiguous(dev, page, size);
+}
+
 static struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
gfp_t gfp)
 {
@@ -86,6 +95,16 @@ static struct page *__dma_direct_alloc_pages(struct device 
*dev, size_t size,
 
gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
   &phys_limit);
+   if (IS_ENABLED(CONFIG_DMA_RESTRICTED_POOL) &&
+   is_swiotlb_for_alloc(dev)) {
+   page = swiotlb_alloc(dev, size);
+   if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
+   __dma_direct_free_pages(dev, page, size);
+   return NULL;
+   }
+   return page;
+   }
+
page = dma_alloc_contiguous(dev, size, gfp);
if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
dma_free_contiguous(dev, page, size);
@@ -142,7 +161,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
gfp |= __GFP_NOWARN;
 
if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
-   !force_dma_unencrypted(dev)) {
+   !force_dma_unencrypted(dev) && !is_swiotlb_for_alloc(dev)) {
page = __dma_direct_alloc_pages(dev, size, gfp & ~__GFP_ZERO);
if (!page)
return NULL;
@@ -155,18 +174,23 @@ void *dma_direct_alloc(struct device *dev, size_t size,
}
 
if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
-   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-   !dev_is_dma_coherent(dev))
+   !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !dev_is_dma_coherent(dev) &&
+   !is_swiotlb_for_alloc(dev))
return arch_dma_alloc(dev, size, dma_handle, gfp, attrs);
 
/*
 * Remapping or decrypting memory may block. If either is required and
 * we can't block, allocate the memory from the atomic pools.
+* If restricted DMA (i.e., is_swiotlb_for_alloc) i

[PATCH v14 08/12] swiotlb: Refactor swiotlb_tbl_unmap_single

2021-06-18 Thread Claire Chang
Add a new function, swiotlb_release_slots, to make the code reusable for
supporting different bounce buffer pools.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
---
 kernel/dma/swiotlb.c | 35 ---
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index daf38a52e66d..e79383df5d4a 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -556,27 +556,15 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
return tlb_addr;
 }
 
-/*
- * tlb_addr is the physical address of the bounce buffer to unmap.
- */
-void swiotlb_tbl_unmap_single(struct device *hwdev, phys_addr_t tlb_addr,
- size_t mapping_size, enum dma_data_direction dir,
- unsigned long attrs)
+static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
 {
-   struct io_tlb_mem *mem = hwdev->dma_io_tlb_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned long flags;
-   unsigned int offset = swiotlb_align_offset(hwdev, tlb_addr);
+   unsigned int offset = swiotlb_align_offset(dev, tlb_addr);
int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
int nslots = nr_slots(mem->slots[index].alloc_size + offset);
int count, i;
 
-   /*
-* First, sync the memory before unmapping the entry
-*/
-   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
-   (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
-   swiotlb_bounce(hwdev, tlb_addr, mapping_size, DMA_FROM_DEVICE);
-
/*
 * Return the buffer to the free list by setting the corresponding
 * entries to indicate the number of contiguous entries available.
@@ -611,6 +599,23 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
phys_addr_t tlb_addr,
spin_unlock_irqrestore(&mem->lock, flags);
 }
 
+/*
+ * tlb_addr is the physical address of the bounce buffer to unmap.
+ */
+void swiotlb_tbl_unmap_single(struct device *dev, phys_addr_t tlb_addr,
+ size_t mapping_size, enum dma_data_direction dir,
+ unsigned long attrs)
+{
+   /*
+* First, sync the memory before unmapping the entry
+*/
+   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
+   (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
+   swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_FROM_DEVICE);
+
+   swiotlb_release_slots(dev, tlb_addr);
+}
+
 void swiotlb_sync_single_for_device(struct device *dev, phys_addr_t tlb_addr,
size_t size, enum dma_data_direction dir)
 {
-- 
2.32.0.288.g62a8d224e6-goog



[PATCH v14 07/12] swiotlb: Move alloc_size to swiotlb_find_slots

2021-06-18 Thread Claire Chang
Rename find_slots to swiotlb_find_slots and move the maintenance of
alloc_size to it for better code reusability later.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
---
 kernel/dma/swiotlb.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 0d294bbf274c..daf38a52e66d 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -432,8 +432,8 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, 
unsigned int index)
  * Find a suitable number of IO TLB entries size that will fit this request and
  * allocate a buffer from that IO TLB pool.
  */
-static int find_slots(struct device *dev, phys_addr_t orig_addr,
-   size_t alloc_size)
+static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
+ size_t alloc_size)
 {
struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned long boundary_mask = dma_get_seg_boundary(dev);
@@ -488,8 +488,11 @@ static int find_slots(struct device *dev, phys_addr_t 
orig_addr,
return -1;
 
 found:
-   for (i = index; i < index + nslots; i++)
+   for (i = index; i < index + nslots; i++) {
mem->slots[i].list = 0;
+   mem->slots[i].alloc_size =
+   alloc_size - ((i - index) << IO_TLB_SHIFT);
+   }
for (i = index - 1;
 io_tlb_offset(i) != IO_TLB_SEGSIZE - 1 &&
 mem->slots[i].list; i--)
@@ -530,7 +533,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
return (phys_addr_t)DMA_MAPPING_ERROR;
}
 
-   index = find_slots(dev, orig_addr, alloc_size + offset);
+   index = swiotlb_find_slots(dev, orig_addr, alloc_size + offset);
if (index == -1) {
if (!(attrs & DMA_ATTR_NO_WARN))
dev_warn_ratelimited(dev,
@@ -544,11 +547,8 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
 * This is needed when we sync the memory.  Then we sync the buffer if
 * needed.
 */
-   for (i = 0; i < nr_slots(alloc_size + offset); i++) {
+   for (i = 0; i < nr_slots(alloc_size + offset); i++)
mem->slots[index + i].orig_addr = slot_addr(orig_addr, i);
-   mem->slots[index + i].alloc_size =
-   alloc_size - (i << IO_TLB_SHIFT);
-   }
tlb_addr = slot_addr(mem->start, index) + offset;
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
-- 
2.32.0.288.g62a8d224e6-goog



[PATCH v14 06/12] swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing

2021-06-18 Thread Claire Chang
Propagate the swiotlb_force into io_tlb_default_mem->force_bounce and
use it to determine whether to bounce the data or not. This will be
useful later to allow for different pools.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
Acked-by: Stefano Stabellini 
---
 drivers/xen/swiotlb-xen.c |  2 +-
 include/linux/swiotlb.h   | 11 +++
 kernel/dma/direct.c   |  2 +-
 kernel/dma/direct.h   |  2 +-
 kernel/dma/swiotlb.c  |  4 
 5 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 0c6ed09f8513..4730a146fa35 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -369,7 +369,7 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, 
struct page *page,
if (dma_capable(dev, dev_addr, size, true) &&
!range_straddles_page_boundary(phys, size) &&
!xen_arch_need_swiotlb(dev, phys, dev_addr) &&
-   swiotlb_force != SWIOTLB_FORCE)
+   !is_swiotlb_force_bounce(dev))
goto done;
 
/*
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index dd1c30a83058..8d8855c77d9a 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -84,6 +84,7 @@ extern enum swiotlb_force swiotlb_force;
  * unmap calls.
  * @debugfs:   The dentry to debugfs.
  * @late_alloc:%true if allocated using the page allocator
+ * @force_bounce: %true if swiotlb bouncing is forced
  */
 struct io_tlb_mem {
phys_addr_t start;
@@ -94,6 +95,7 @@ struct io_tlb_mem {
spinlock_t lock;
struct dentry *debugfs;
bool late_alloc;
+   bool force_bounce;
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
@@ -109,6 +111,11 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
return mem && paddr >= mem->start && paddr < mem->end;
 }
 
+static inline bool is_swiotlb_force_bounce(struct device *dev)
+{
+   return dev->dma_io_tlb_mem->force_bounce;
+}
+
 void __init swiotlb_exit(void);
 unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
@@ -120,6 +127,10 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
 {
return false;
 }
+static inline bool is_swiotlb_force_bounce(struct device *dev)
+{
+   return false;
+}
 static inline void swiotlb_exit(void)
 {
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 7a88c34d0867..a92465b4eb12 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -496,7 +496,7 @@ size_t dma_direct_max_mapping_size(struct device *dev)
 {
/* If SWIOTLB is active, use its maximum mapping size */
if (is_swiotlb_active(dev) &&
-   (dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))
+   (dma_addressing_limited(dev) || is_swiotlb_force_bounce(dev)))
return swiotlb_max_mapping_size(dev);
return SIZE_MAX;
 }
diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index 13e9e7158d94..4632b0f4f72e 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -87,7 +87,7 @@ static inline dma_addr_t dma_direct_map_page(struct device 
*dev,
phys_addr_t phys = page_to_phys(page) + offset;
dma_addr_t dma_addr = phys_to_dma(dev, phys);
 
-   if (unlikely(swiotlb_force == SWIOTLB_FORCE))
+   if (is_swiotlb_force_bounce(dev))
return swiotlb_map(dev, phys, size, dir, attrs);
 
if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 8a120f42340b..0d294bbf274c 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -179,6 +179,10 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
*mem, phys_addr_t start,
mem->end = mem->start + bytes;
mem->index = 0;
mem->late_alloc = late_alloc;
+
+   if (swiotlb_force == SWIOTLB_FORCE)
+   mem->force_bounce = true;
+
spin_lock_init(&mem->lock);
for (i = 0; i < mem->nslabs; i++) {
mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
-- 
2.32.0.288.g62a8d224e6-goog



[PATCH v14 05/12] swiotlb: Update is_swiotlb_active to add a struct device argument

2021-06-18 Thread Claire Chang
Update is_swiotlb_active to add a struct device argument. This will be
useful later to allow for different pools.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
Acked-by: Stefano Stabellini 
---
 drivers/gpu/drm/i915/gem/i915_gem_internal.c | 2 +-
 drivers/gpu/drm/nouveau/nouveau_ttm.c| 2 +-
 drivers/pci/xen-pcifront.c   | 2 +-
 include/linux/swiotlb.h  | 4 ++--
 kernel/dma/direct.c  | 2 +-
 kernel/dma/swiotlb.c | 4 ++--
 6 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_internal.c 
b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
index a9d65fc8aa0e..4b7afa0fc85d 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_internal.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_internal.c
@@ -42,7 +42,7 @@ static int i915_gem_object_get_pages_internal(struct 
drm_i915_gem_object *obj)
 
max_order = MAX_ORDER;
 #ifdef CONFIG_SWIOTLB
-   if (is_swiotlb_active()) {
+   if (is_swiotlb_active(obj->base.dev->dev)) {
unsigned int max_segment;
 
max_segment = swiotlb_max_segment();
diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c 
b/drivers/gpu/drm/nouveau/nouveau_ttm.c
index 9662522aa066..be15bfd9e0ee 100644
--- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
@@ -321,7 +321,7 @@ nouveau_ttm_init(struct nouveau_drm *drm)
}
 
 #if IS_ENABLED(CONFIG_SWIOTLB) && IS_ENABLED(CONFIG_X86)
-   need_swiotlb = is_swiotlb_active();
+   need_swiotlb = is_swiotlb_active(dev->dev);
 #endif
 
ret = ttm_bo_device_init(&drm->ttm.bdev, &nouveau_bo_driver,
diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c
index b7a8f3a1921f..0d56985bfe81 100644
--- a/drivers/pci/xen-pcifront.c
+++ b/drivers/pci/xen-pcifront.c
@@ -693,7 +693,7 @@ static int pcifront_connect_and_init_dma(struct 
pcifront_device *pdev)
 
spin_unlock(&pcifront_dev_lock);
 
-   if (!err && !is_swiotlb_active()) {
+   if (!err && !is_swiotlb_active(&pdev->xdev->dev)) {
err = pci_xen_swiotlb_init_late();
if (err)
dev_err(&pdev->xdev->dev, "Could not setup SWIOTLB!\n");
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index d1f3d95881cd..dd1c30a83058 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -112,7 +112,7 @@ static inline bool is_swiotlb_buffer(struct device *dev, 
phys_addr_t paddr)
 void __init swiotlb_exit(void);
 unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
-bool is_swiotlb_active(void);
+bool is_swiotlb_active(struct device *dev);
 void __init swiotlb_adjust_size(unsigned long size);
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
@@ -132,7 +132,7 @@ static inline size_t swiotlb_max_mapping_size(struct device 
*dev)
return SIZE_MAX;
 }
 
-static inline bool is_swiotlb_active(void)
+static inline bool is_swiotlb_active(struct device *dev)
 {
return false;
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 84c9feb5474a..7a88c34d0867 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -495,7 +495,7 @@ int dma_direct_supported(struct device *dev, u64 mask)
 size_t dma_direct_max_mapping_size(struct device *dev)
 {
/* If SWIOTLB is active, use its maximum mapping size */
-   if (is_swiotlb_active() &&
+   if (is_swiotlb_active(dev) &&
(dma_addressing_limited(dev) || swiotlb_force == SWIOTLB_FORCE))
return swiotlb_max_mapping_size(dev);
return SIZE_MAX;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 72a4289faed1..8a120f42340b 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -664,9 +664,9 @@ size_t swiotlb_max_mapping_size(struct device *dev)
return ((size_t)IO_TLB_SIZE) * IO_TLB_SEGSIZE;
 }
 
-bool is_swiotlb_active(void)
+bool is_swiotlb_active(struct device *dev)
 {
-   return io_tlb_default_mem != NULL;
+   return dev->dma_io_tlb_mem != NULL;
 }
 EXPORT_SYMBOL_GPL(is_swiotlb_active);
 
-- 
2.32.0.288.g62a8d224e6-goog



[PATCH v14 04/12] swiotlb: Update is_swiotlb_buffer to add a struct device argument

2021-06-18 Thread Claire Chang
Update is_swiotlb_buffer to add a struct device argument. This will be
useful later to allow for different pools.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
Acked-by: Stefano Stabellini 
---
 drivers/iommu/dma-iommu.c | 12 ++--
 drivers/xen/swiotlb-xen.c |  2 +-
 include/linux/swiotlb.h   |  7 ---
 kernel/dma/direct.c   |  6 +++---
 kernel/dma/direct.h   |  6 +++---
 5 files changed, 17 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 3087d9fa6065..10997ef541f8 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -507,7 +507,7 @@ static void __iommu_dma_unmap_swiotlb(struct device *dev, 
dma_addr_t dma_addr,
 
__iommu_dma_unmap(dev, dma_addr, size);
 
-   if (unlikely(is_swiotlb_buffer(phys)))
+   if (unlikely(is_swiotlb_buffer(dev, phys)))
swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
 }
 
@@ -578,7 +578,7 @@ static dma_addr_t __iommu_dma_map_swiotlb(struct device 
*dev, phys_addr_t phys,
}
 
iova = __iommu_dma_map(dev, phys, aligned_size, prot, dma_mask);
-   if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(phys))
+   if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(dev, phys))
swiotlb_tbl_unmap_single(dev, phys, org_size, dir, attrs);
return iova;
 }
@@ -749,7 +749,7 @@ static void iommu_dma_sync_single_for_cpu(struct device 
*dev,
if (!dev_is_dma_coherent(dev))
arch_sync_dma_for_cpu(phys, size, dir);
 
-   if (is_swiotlb_buffer(phys))
+   if (is_swiotlb_buffer(dev, phys))
swiotlb_sync_single_for_cpu(dev, phys, size, dir);
 }
 
@@ -762,7 +762,7 @@ static void iommu_dma_sync_single_for_device(struct device 
*dev,
return;
 
phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
-   if (is_swiotlb_buffer(phys))
+   if (is_swiotlb_buffer(dev, phys))
swiotlb_sync_single_for_device(dev, phys, size, dir);
 
if (!dev_is_dma_coherent(dev))
@@ -783,7 +783,7 @@ static void iommu_dma_sync_sg_for_cpu(struct device *dev,
if (!dev_is_dma_coherent(dev))
arch_sync_dma_for_cpu(sg_phys(sg), sg->length, dir);
 
-   if (is_swiotlb_buffer(sg_phys(sg)))
+   if (is_swiotlb_buffer(dev, sg_phys(sg)))
swiotlb_sync_single_for_cpu(dev, sg_phys(sg),
sg->length, dir);
}
@@ -800,7 +800,7 @@ static void iommu_dma_sync_sg_for_device(struct device *dev,
return;
 
for_each_sg(sgl, sg, nelems, i) {
-   if (is_swiotlb_buffer(sg_phys(sg)))
+   if (is_swiotlb_buffer(dev, sg_phys(sg)))
swiotlb_sync_single_for_device(dev, sg_phys(sg),
   sg->length, dir);
 
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 4c89afc0df62..0c6ed09f8513 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -100,7 +100,7 @@ static int is_xen_swiotlb_buffer(struct device *dev, 
dma_addr_t dma_addr)
 * in our domain. Therefore _only_ check address within our domain.
 */
if (pfn_valid(PFN_DOWN(paddr)))
-   return is_swiotlb_buffer(paddr);
+   return is_swiotlb_buffer(dev, paddr);
return 0;
 }
 
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 216854a5e513..d1f3d95881cd 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -2,6 +2,7 @@
 #ifndef __LINUX_SWIOTLB_H
 #define __LINUX_SWIOTLB_H
 
+#include 
 #include 
 #include 
 #include 
@@ -101,9 +102,9 @@ struct io_tlb_mem {
 };
 extern struct io_tlb_mem *io_tlb_default_mem;
 
-static inline bool is_swiotlb_buffer(phys_addr_t paddr)
+static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
 
return mem && paddr >= mem->start && paddr < mem->end;
 }
@@ -115,7 +116,7 @@ bool is_swiotlb_active(void);
 void __init swiotlb_adjust_size(unsigned long size);
 #else
 #define swiotlb_force SWIOTLB_NO_FORCE
-static inline bool is_swiotlb_buffer(phys_addr_t paddr)
+static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
return false;
 }
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index f737e3347059..84c9feb5474a 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -343,7 +343,7 @@ void dma_direct_sync_sg_for_device(struct device *dev,
for_each_sg(sgl, sg, nents, i) {
phys_addr_t paddr = dma_to_phys(dev, sg_dma_address(sg));
 
-   if (unlikely(is_swiotlb_buffer(paddr)))
+   if (unlikely(is_swiotlb_buffer(dev, paddr)))
  

[PATCH v14 03/12] swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used

2021-06-18 Thread Claire Chang
Always have the pointer to the swiotlb pool used in struct device. This
could help simplify the code for other pools.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
Acked-by: Stefano Stabellini 
---
 drivers/base/core.c| 4 
 include/linux/device.h | 4 
 kernel/dma/swiotlb.c   | 8 
 3 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index f29839382f81..cb3123e3954d 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include  /* for dma_default_coherent */
 
@@ -2736,6 +2737,9 @@ void device_initialize(struct device *dev)
 defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL)
dev->dma_coherent = dma_default_coherent;
 #endif
+#ifdef CONFIG_SWIOTLB
+   dev->dma_io_tlb_mem = io_tlb_default_mem;
+#endif
 }
 EXPORT_SYMBOL_GPL(device_initialize);
 
diff --git a/include/linux/device.h b/include/linux/device.h
index ba660731bd25..240d652a0696 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -416,6 +416,7 @@ struct dev_links_info {
  * @dma_pools: Dma pools (if dma'ble device).
  * @dma_mem:   Internal for coherent mem override.
  * @cma_area:  Contiguous memory area for dma allocations
+ * @dma_io_tlb_mem: Pointer to the swiotlb pool used.  Not for driver use.
  * @archdata:  For arch-specific additions.
  * @of_node:   Associated device tree node.
  * @fwnode:Associated device node supplied by platform firmware.
@@ -518,6 +519,9 @@ struct device {
 #ifdef CONFIG_DMA_CMA
struct cma *cma_area;   /* contiguous memory area for dma
   allocations */
+#endif
+#ifdef CONFIG_SWIOTLB
+   struct io_tlb_mem *dma_io_tlb_mem;
 #endif
/* arch specific additions */
struct dev_archdata archdata;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index ede66df6835b..72a4289faed1 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -340,7 +340,7 @@ void __init swiotlb_exit(void)
 static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t 
size,
   enum dma_data_direction dir)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
unsigned int offset = (tlb_addr - mem->start) & (IO_TLB_SIZE - 1);
phys_addr_t orig_addr = mem->slots[index].orig_addr;
@@ -431,7 +431,7 @@ static unsigned int wrap_index(struct io_tlb_mem *mem, 
unsigned int index)
 static int find_slots(struct device *dev, phys_addr_t orig_addr,
size_t alloc_size)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned long boundary_mask = dma_get_seg_boundary(dev);
dma_addr_t tbl_dma_addr =
phys_to_dma_unencrypted(dev, mem->start) & boundary_mask;
@@ -508,7 +508,7 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, 
phys_addr_t orig_addr,
size_t mapping_size, size_t alloc_size,
enum dma_data_direction dir, unsigned long attrs)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
unsigned int offset = swiotlb_align_offset(dev, orig_addr);
unsigned int i;
int index;
@@ -559,7 +559,7 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
phys_addr_t tlb_addr,
  size_t mapping_size, enum dma_data_direction dir,
  unsigned long attrs)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
+   struct io_tlb_mem *mem = hwdev->dma_io_tlb_mem;
unsigned long flags;
unsigned int offset = swiotlb_align_offset(hwdev, tlb_addr);
int index = (tlb_addr - offset - mem->start) >> IO_TLB_SHIFT;
-- 
2.32.0.288.g62a8d224e6-goog



[PATCH v14 02/12] swiotlb: Refactor swiotlb_create_debugfs

2021-06-18 Thread Claire Chang
Split the debugfs creation to make the code reusable for supporting
different bounce buffer pools.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
---
 kernel/dma/swiotlb.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 1f9b2b9e7490..ede66df6835b 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -671,19 +671,26 @@ bool is_swiotlb_active(void)
 EXPORT_SYMBOL_GPL(is_swiotlb_active);
 
 #ifdef CONFIG_DEBUG_FS
+static struct dentry *debugfs_dir;
 
-static int __init swiotlb_create_debugfs(void)
+static void swiotlb_create_debugfs_files(struct io_tlb_mem *mem)
 {
-   struct io_tlb_mem *mem = io_tlb_default_mem;
-
-   if (!mem)
-   return 0;
-   mem->debugfs = debugfs_create_dir("swiotlb", NULL);
debugfs_create_ulong("io_tlb_nslabs", 0400, mem->debugfs, &mem->nslabs);
debugfs_create_ulong("io_tlb_used", 0400, mem->debugfs, &mem->used);
+}
+
+static int __init swiotlb_create_default_debugfs(void)
+{
+   struct io_tlb_mem *mem = io_tlb_default_mem;
+
+   debugfs_dir = debugfs_create_dir("swiotlb", NULL);
+   if (mem) {
+   mem->debugfs = debugfs_dir;
+   swiotlb_create_debugfs_files(mem);
+   }
return 0;
 }
 
-late_initcall(swiotlb_create_debugfs);
+late_initcall(swiotlb_create_default_debugfs);
 
 #endif
-- 
2.32.0.288.g62a8d224e6-goog



[PATCH v14 01/12] swiotlb: Refactor swiotlb init functions

2021-06-18 Thread Claire Chang
Add a new function, swiotlb_init_io_tlb_mem, for the io_tlb_mem struct
initialization to make the code reusable.

Signed-off-by: Claire Chang 
Reviewed-by: Christoph Hellwig 
Tested-by: Stefano Stabellini 
Tested-by: Will Deacon 
---
 kernel/dma/swiotlb.c | 50 ++--
 1 file changed, 25 insertions(+), 25 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 52e2ac526757..1f9b2b9e7490 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -168,9 +168,28 @@ void __init swiotlb_update_mem_attributes(void)
memset(vaddr, 0, bytes);
 }
 
-int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
+static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
+   unsigned long nslabs, bool late_alloc)
 {
+   void *vaddr = phys_to_virt(start);
unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
+
+   mem->nslabs = nslabs;
+   mem->start = start;
+   mem->end = mem->start + bytes;
+   mem->index = 0;
+   mem->late_alloc = late_alloc;
+   spin_lock_init(&mem->lock);
+   for (i = 0; i < mem->nslabs; i++) {
+   mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
+   mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
+   mem->slots[i].alloc_size = 0;
+   }
+   memset(vaddr, 0, bytes);
+}
+
+int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
+{
struct io_tlb_mem *mem;
size_t alloc_size;
 
@@ -186,16 +205,8 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long 
nslabs, int verbose)
if (!mem)
panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
  __func__, alloc_size, PAGE_SIZE);
-   mem->nslabs = nslabs;
-   mem->start = __pa(tlb);
-   mem->end = mem->start + bytes;
-   mem->index = 0;
-   spin_lock_init(&mem->lock);
-   for (i = 0; i < mem->nslabs; i++) {
-   mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
-   mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
-   mem->slots[i].alloc_size = 0;
-   }
+
+   swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
 
io_tlb_default_mem = mem;
if (verbose)
@@ -282,8 +293,8 @@ swiotlb_late_init_with_default_size(size_t default_size)
 int
 swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
 {
-   unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
struct io_tlb_mem *mem;
+   unsigned long bytes = nslabs << IO_TLB_SHIFT;
 
if (swiotlb_force == SWIOTLB_NO_FORCE)
return 0;
@@ -297,20 +308,9 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
if (!mem)
return -ENOMEM;
 
-   mem->nslabs = nslabs;
-   mem->start = virt_to_phys(tlb);
-   mem->end = mem->start + bytes;
-   mem->index = 0;
-   mem->late_alloc = 1;
-   spin_lock_init(&mem->lock);
-   for (i = 0; i < mem->nslabs; i++) {
-   mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
-   mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
-   mem->slots[i].alloc_size = 0;
-   }
-
+   memset(mem, 0, sizeof(*mem));
set_memory_decrypted((unsigned long)tlb, bytes >> PAGE_SHIFT);
-   memset(tlb, 0, bytes);
+   swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true);
 
io_tlb_default_mem = mem;
swiotlb_print_info();
-- 
2.32.0.288.g62a8d224e6-goog



[PATCH v14 00/12] Restricted DMA

2021-06-18 Thread Claire Chang
This series implements mitigations for lack of DMA access control on
systems without an IOMMU, which could result in the DMA accessing the
system memory at unexpected times and/or unexpected addresses, possibly
leading to data leakage or corruption.

For example, we plan to use the PCI-e bus for Wi-Fi and that PCI-e bus is
not behind an IOMMU. As PCI-e, by design, gives the device full access to
system memory, a vulnerability in the Wi-Fi firmware could easily escalate
to a full system exploit (remote wifi exploits: [1a], [1b] that shows a
full chain of exploits; [2], [3]).

To mitigate the security concerns, we introduce restricted DMA. Restricted
DMA utilizes the existing swiotlb to bounce streaming DMA in and out of a
specially allocated region and does memory allocation from the same region.
The feature on its own provides a basic level of protection against the DMA
overwriting buffer contents at unexpected times. However, to protect
against general data leakage and system memory corruption, the system needs
to provide a way to restrict the DMA to a predefined memory region (this is
usually done at firmware level, e.g. MPU in ATF on some ARM platforms [4]).

[1a] 
https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_4.html
[1b] 
https://googleprojectzero.blogspot.com/2017/04/over-air-exploiting-broadcoms-wi-fi_11.html
[2] https://blade.tencent.com/en/advisories/qualpwn/
[3] 
https://www.bleepingcomputer.com/news/security/vulnerabilities-found-in-highly-popular-firmware-for-wifi-chips/
[4] 
https://github.com/ARM-software/arm-trusted-firmware/blob/master/plat/mediatek/mt8183/drivers/emi_mpu/emi_mpu.c#L132

v14:
- Move set_memory_decrypted before swiotlb_init_io_tlb_mem (patch 01/12, 10,12)
- Add Stefano's Acked-by tag from v13

v13:
- Fix xen-swiotlb issues
  - memset in patch 01/12
  - is_swiotlb_force_bounce in patch 06/12
- Fix the dts example typo in reserved-memory.txt
- Add Stefano and Will's Tested-by tag from v12
https://lore.kernel.org/patchwork/cover/1448001/

v12:
Split is_dev_swiotlb_force into is_swiotlb_force_bounce (patch 06/12) and
is_swiotlb_for_alloc (patch 09/12)
https://lore.kernel.org/patchwork/cover/1447254/

v11:
- Rebase against swiotlb devel/for-linus-5.14
- s/mempry/memory/g
- exchange the order of patch 09/12 and 10/12
https://lore.kernel.org/patchwork/cover/1447216/

v10:
Address the comments in v9 to
  - fix the dev->dma_io_tlb_mem assignment
  - propagate swiotlb_force setting into io_tlb_default_mem->force
  - move set_memory_decrypted out of swiotlb_init_io_tlb_mem
  - move debugfs_dir declaration into the main CONFIG_DEBUG_FS block
  - add swiotlb_ prefix to find_slots and release_slots
  - merge the 3 alloc/free related patches
  - move the CONFIG_DMA_RESTRICTED_POOL later
https://lore.kernel.org/patchwork/cover/1446882/

v9:
Address the comments in v7 to
  - set swiotlb active pool to dev->dma_io_tlb_mem
  - get rid of get_io_tlb_mem
  - dig out the device struct for is_swiotlb_active
  - move debugfs_create_dir out of swiotlb_create_debugfs
  - do set_memory_decrypted conditionally in swiotlb_init_io_tlb_mem
  - use IS_ENABLED in kernel/dma/direct.c
  - fix redefinition of 'of_dma_set_restricted_buffer'
https://lore.kernel.org/patchwork/cover/1445081/

v8:
- Fix reserved-memory.txt and add the reg property in example.
- Fix sizeof for of_property_count_elems_of_size in
  drivers/of/address.c#of_dma_set_restricted_buffer.
- Apply Will's suggestion to try the OF node having DMA configuration in
  drivers/of/address.c#of_dma_set_restricted_buffer.
- Fix typo in the comment of drivers/of/address.c#of_dma_set_restricted_buffer.
- Add error message for PageHighMem in
  kernel/dma/swiotlb.c#rmem_swiotlb_device_init and move it to
  rmem_swiotlb_setup.
- Fix the message string in rmem_swiotlb_setup.
https://lore.kernel.org/patchwork/cover/1437112/

v7:
Fix debugfs, PageHighMem and comment style in rmem_swiotlb_device_init
https://lore.kernel.org/patchwork/cover/1431031/

v6:
Address the comments in v5
https://lore.kernel.org/patchwork/cover/1423201/

v5:
Rebase on latest linux-next
https://lore.kernel.org/patchwork/cover/1416899/

v4:
- Fix spinlock bad magic
- Use rmem->name for debugfs entry
- Address the comments in v3
https://lore.kernel.org/patchwork/cover/1378113/

v3:
Using only one reserved memory region for both streaming DMA and memory
allocation.
https://lore.kernel.org/patchwork/cover/1360992/

v2:
Building on top of swiotlb.
https://lore.kernel.org/patchwork/cover/1280705/

v1:
Using dma_map_ops.
https://lore.kernel.org/patchwork/cover/1271660/


Claire Chang (12):
  swiotlb: Refactor swiotlb init functions
  swiotlb: Refactor swiotlb_create_debugfs
  swiotlb: Set dev->dma_io_tlb_mem to the swiotlb pool used
  swiotlb: Update is_swiotlb_buffer to add a struct device argument
  swiotlb: Update is_swiotlb_active to add a struct device argument
  swiotlb: Use is_swiotlb_force_bounce for swiotlb data bouncing
  swiotlb: Move alloc_size t

Re: [PATCH v6 13/17] powerpc/pseries/vas: Setup IRQ and fault handling

2021-06-18 Thread Nicholas Piggin
Excerpts from Haren Myneni's message of June 18, 2021 12:09 pm:
> On Fri, 2021-06-18 at 09:34 +1000, Nicholas Piggin wrote:
>> Excerpts from Haren Myneni's message of June 18, 2021 6:37 am:
>> > NX generates an interrupt when sees a fault on the user space
>> > buffer and the hypervisor forwards that interrupt to OS. Then
>> > the kernel handles the interrupt by issuing H_GET_NX_FAULT hcall
>> > to retrieve the fault CRB information.
>> > 
>> > This patch also adds changes to setup and free IRQ per each
>> > window and also handles the fault by updating the CSB.
>> 
>> In as much as this pretty well corresponds to the PowerNV code
>> AFAIKS,
>> it looks okay to me.
>> 
>> Reviewed-by: Nicholas Piggin 
>> 
>> Could you have an irq handler in your ops vector and have 
>> the core code set up the irq and call your handler, so the Linux irq
>> handling is in one place? Not something for this series, I was just
>> wondering.
> 
> Not possible to have common core code for IRQ  setup. 
> 
> PowerNV: Every VAS instance will be having IRQ and this setup will be
> done during initialization (system boot). A fault FIFO will be assigned
> for each instance and registered to VAS so that VAS/NX writes fault CRB
> into this FIFO.  
> 
> PowerVM: Each window will have an IRQ and the setup will be done during
> window open. 

Yeah, I thought as much. Just wondering.

Thanks,
Nick


Re: [PATCH v6 12/17] powerpc/pseries/vas: Integrate API with open/close windows

2021-06-18 Thread Nicholas Piggin
Excerpts from Haren Myneni's message of June 18, 2021 5:49 pm:
> On Fri, 2021-06-18 at 09:22 +1000, Nicholas Piggin wrote:
>> Excerpts from Haren Myneni's message of June 18, 2021 6:36 am:
>> > This patch adds VAS window allocatioa/close with the corresponding
>> > hcalls. Also changes to integrate with the existing user space VAS
>> > API and provide register/unregister functions to NX pseries driver.
>> > 
>> > The driver register function is used to create the user space
>> > interface (/dev/crypto/nx-gzip) and unregister to remove this
>> > entry.
>> > 
>> > The user space process opens this device node and makes an ioctl
>> > to allocate VAS window. The close interface is used to deallocate
>> > window.
>> > 
>> > Signed-off-by: Haren Myneni 
>> 
>> Reviewed-by: Nicholas Piggin 
>> 
>> Unless there is some significant performance reason it might be
>> simplest
>> to take the mutex for the duration of the allocate and frees rather
>> than 
>> taking it several times, covering the atomic with the lock instead.
>> 
>> You have a big lock, might as well use it and not have to wonder what
>> if 
>> things race here or there.
> 
> Using mutex to protect allocate/deallocate window and setup/free IRQ,
> also to protect updating the list. We do not need lock for modify
> window hcall and other things. Hence taking mutex several times.

Right, at which point you have to consider what happens with 
interleaving allocates and deallocates. I'm not saying it's wrong, just 
that if you do credential allocation, hcall allocation, irq allocation, 
and list insertion all under the one lock, and remoe it all under the 
one lock, concurrency requires less attention.


> Also
> used atomic for counters (used_lpar_creds) which can be exported in
> sysfs (this patch will be added later in next enhancement seris). 

That's okay you can use mutexes for that too if that's how you're
protecting them.

> 
> Genarlly applications open window initially, do continuous copy/paste
> operations and close window later. But possible that the library /
> application to open/close window for each request. Also may be opening
> or closing multiple windows (say 1000 depends on cores on the system)
> at the same time. These cases may affect the application performance.

It definitely could if you have a lot of concurrent open/close, but
the code as is won't handle it all that well either, so there's the
question of what is reasonable to do and what is reasonable to add
concurrency complexity for.

As I said, you've got it working and seem to have covered all cases now 
so let's get the series in first. But something to consider changing
IMO.

Thanks,
Nick


Re: [PATCH v2 9/9] powerpc/boot: Add a boot wrapper for Microwatt

2021-06-18 Thread Nicholas Piggin
Excerpts from Paul Mackerras's message of June 18, 2021 1:49 pm:
> From: Joel Stanley 
> 
> This allows microwatt's kernel to be built with an embedded device tree.
> 
> Load to arch/powerpc/boot/dtbImage.microwatt to 0x50:
> 
>  mw_debug -b fpga stop load arch/powerpc/boot/dtbImage.microwatt 50 start
> 
> Signed-off-by: Joel Stanley 
> Signed-off-by: Paul Mackerras 

Thanks for folding and commenting that change. Ack for this and the rest 
of the platform and dt and interrupt handling patches FWIW, but I don't 
know much about any of these areas to give an informed review.

Thanks,
Nick

> ---
>  arch/powerpc/boot/Makefile|  4 
>  arch/powerpc/boot/microwatt.c | 24 
>  arch/powerpc/boot/wrapper |  5 +
>  3 files changed, 33 insertions(+)
>  create mode 100644 arch/powerpc/boot/microwatt.c
> 
> diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
> index 2b8da923ceca..dfaa4094fcae 100644
> --- a/arch/powerpc/boot/Makefile
> +++ b/arch/powerpc/boot/Makefile
> @@ -163,6 +163,8 @@ src-plat-$(CONFIG_PPC_POWERNV) += pseries-head.S
>  src-plat-$(CONFIG_PPC_IBM_CELL_BLADE) += pseries-head.S
>  src-plat-$(CONFIG_MVME7100) += motload-head.S mvme7100.c
>  
> +src-plat-$(CONFIG_PPC_MICROWATT) += fixed-head.S microwatt.c
> +
>  src-wlib := $(sort $(src-wlib-y))
>  src-plat := $(sort $(src-plat-y))
>  src-boot := $(src-wlib) $(src-plat) empty.c
> @@ -355,6 +357,8 @@ image-$(CONFIG_MVME5100)  += dtbImage.mvme5100
>  # Board port in arch/powerpc/platform/amigaone/Kconfig
>  image-$(CONFIG_AMIGAONE) += cuImage.amigaone
>  
> +image-$(CONFIG_PPC_MICROWATT)+= dtbImage.microwatt
> +
>  # For 32-bit powermacs, build the COFF and miboot images
>  # as well as the ELF images.
>  ifdef CONFIG_PPC32
> diff --git a/arch/powerpc/boot/microwatt.c b/arch/powerpc/boot/microwatt.c
> new file mode 100644
> index ..ca9d83617fc1
> --- /dev/null
> +++ b/arch/powerpc/boot/microwatt.c
> @@ -0,0 +1,24 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +
> +#include 
> +#include "stdio.h"
> +#include "types.h"
> +#include "io.h"
> +#include "ops.h"
> +
> +BSS_STACK(8192);
> +
> +void platform_init(unsigned long r3, unsigned long r4, unsigned long r5)
> +{
> + unsigned long heapsize = 16*1024*1024 - (unsigned long)_end;
> +
> + /*
> +  * Disable interrupts and turn off MSR_RI, since we'll
> +  * shortly be overwriting the interrupt vectors.
> +  */
> + __asm__ volatile("mtmsrd %0,1" : : "r" (0));
> +
> + simple_alloc_init(_end, heapsize, 32, 64);
> + fdt_init(_dtb_start);
> + serial_console_init();
> +}
> diff --git a/arch/powerpc/boot/wrapper b/arch/powerpc/boot/wrapper
> index 41fa0a8715e3..ae48fffa1e13 100755
> --- a/arch/powerpc/boot/wrapper
> +++ b/arch/powerpc/boot/wrapper
> @@ -342,6 +342,11 @@ gamecube|wii)
>  link_address='0x60'
>  platformo="$object/$platform-head.o $object/$platform.o"
>  ;;
> +microwatt)
> +link_address='0x50'
> +platformo="$object/fixed-head.o $object/$platform.o"
> +binary=y
> +;;
>  treeboot-currituck)
>  link_address='0x100'
>  ;;
> -- 
> 2.31.1
> 
> 


Re: [PATCH v2 8/9] powerpc/boot: Fixup device-tree on little endian

2021-06-18 Thread Nicholas Piggin
Excerpts from Paul Mackerras's message of June 18, 2021 1:49 pm:
> From: Benjamin Herrenschmidt 
> 
> This fixes the core devtree.c functions and the ns16550 UART backend.

Looks okay. Can sparse be run on arch/powerpc/boot? Would be nice to
get that working and endian annotations at some point.

> @@ -240,7 +249,6 @@ static int dt_xlate(void *node, int res, int reglen, 
> unsigned long *addr,
>   return 0;
>  
>   dt_get_reg_format(parent, &naddr, &nsize);
> -
>   if (nsize > 2)
>   return 0;
>  

Unrelated hunk.

> @@ -278,7 +286,6 @@ static int dt_xlate(void *node, int res, int reglen, 
> unsigned long *addr,
>  
>   offset = find_range(last_addr, prop_buf, prev_naddr,
>   naddr, prev_nsize, buflen / 4);
> -
>   if (offset < 0)
>   return 0;
>  

And there.

> diff --git a/arch/powerpc/boot/ns16550.c b/arch/powerpc/boot/ns16550.c
> index b0da4466d419..f16d2be1d0f3 100644
> --- a/arch/powerpc/boot/ns16550.c
> +++ b/arch/powerpc/boot/ns16550.c
> @@ -15,6 +15,7 @@
>  #include "stdio.h"
>  #include "io.h"
>  #include "ops.h"
> +#include "of.h"
>  
>  #define UART_DLL 0   /* Out: Divisor Latch Low */
>  #define UART_DLM 1   /* Out: Divisor Latch High */
> @@ -58,16 +59,20 @@ int ns16550_console_init(void *devp, struct 
> serial_console_data *scdp)
>   int n;
>   u32 reg_offset;
>  
> - if (dt_get_virtual_reg(devp, (void **)®_base, 1) < 1)
> + if (dt_get_virtual_reg(devp, (void **)®_base, 1) < 1) {
> + printf("virt reg parse fail...\r\n");
>   return -1;
> + }

Leftover debug.  Otherwise,

Acked-by: Nicholas Piggin 

Thanks,
Nick


Re: [PATCH v2 6/9] powerpc/microwatt: Add support for hardware random number generator

2021-06-18 Thread Nicholas Piggin
Excerpts from Paul Mackerras's message of June 18, 2021 1:47 pm:
> Microwatt's hardware RNG is accessed using the DARN instruction.
> 

I think we're getting a platforms/book3s soon with the VAS patches, 
might be a place to add the get_random_darn function.

Huh, DARN is unprivileged right? And yet we haven't wired it up in
pseries it still uses an hcall.

Anyway that's all stuff to sort out later.

Reviewed-by: Nicholas Piggin 

> Signed-off-by: Paul Mackerras 
> ---
>  arch/powerpc/platforms/microwatt/Kconfig  |  1 +
>  arch/powerpc/platforms/microwatt/Makefile |  2 +-
>  arch/powerpc/platforms/microwatt/rng.c| 48 +++
>  3 files changed, 50 insertions(+), 1 deletion(-)
>  create mode 100644 arch/powerpc/platforms/microwatt/rng.c
> 
> diff --git a/arch/powerpc/platforms/microwatt/Kconfig 
> b/arch/powerpc/platforms/microwatt/Kconfig
> index 50ed0cedb5f1..8f6a81978461 100644
> --- a/arch/powerpc/platforms/microwatt/Kconfig
> +++ b/arch/powerpc/platforms/microwatt/Kconfig
> @@ -7,6 +7,7 @@ config PPC_MICROWATT
>   select PPC_ICP_NATIVE
>   select PPC_NATIVE
>   select PPC_UDBG_16550
> + select ARCH_RANDOM
>   help
>This option enables support for FPGA-based Microwatt 
> implementations.
>  
> diff --git a/arch/powerpc/platforms/microwatt/Makefile 
> b/arch/powerpc/platforms/microwatt/Makefile
> index e6885b3b2ee7..116d6d3ad3f0 100644
> --- a/arch/powerpc/platforms/microwatt/Makefile
> +++ b/arch/powerpc/platforms/microwatt/Makefile
> @@ -1 +1 @@
> -obj-y+= setup.o
> +obj-y+= setup.o rng.o
> diff --git a/arch/powerpc/platforms/microwatt/rng.c 
> b/arch/powerpc/platforms/microwatt/rng.c
> new file mode 100644
> index ..3d8ee6eb7dad
> --- /dev/null
> +++ b/arch/powerpc/platforms/microwatt/rng.c
> @@ -0,0 +1,48 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Derived from arch/powerpc/platforms/powernv/rng.c, which is:
> + * Copyright 2013, Michael Ellerman, IBM Corporation.
> + */
> +
> +#define pr_fmt(fmt)  "microwatt-rng: " fmt
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define DARN_ERR 0xul
> +
> +int microwatt_get_random_darn(unsigned long *v)
> +{
> + unsigned long val;
> +
> + /* Using DARN with L=1 - 64-bit conditioned random number */
> + asm volatile(PPC_DARN(%0, 1) : "=r"(val));
> +
> + if (val == DARN_ERR)
> + return 0;
> +
> + *v = val;
> +
> + return 1;
> +}
> +
> +static __init int rng_init(void)
> +{
> + unsigned long val;
> + int i;
> +
> + for (i = 0; i < 10; i++) {
> + if (microwatt_get_random_darn(&val)) {
> + ppc_md.get_random_seed = microwatt_get_random_darn;
> + return 0;
> + }
> + }
> +
> + pr_warn("Unable to use DARN for get_random_seed()\n");
> +
> + return -EIO;
> +}
> +machine_subsys_initcall(, rng_init);
> -- 
> 2.31.1
> 
> 


Re: [PATCH v2 1/9] powerpc: Add Microwatt platform

2021-06-18 Thread Nicholas Piggin
Excerpts from Paul Mackerras's message of June 18, 2021 1:43 pm:
> Microwatt is a FPGA-based implementation of the Power ISA.  It
> currently only implements little-endian 64-bit mode, and does
> not (yet) support SMP, VMX, VSX or transactional memory.  It has an
> optional FPU, and an optional MMU (required for running Linux,
> obviously) which implements a configurable radix tree but not
> hypervisor mode or nested radix translation.
> 
> This adds a new machine type to support FPGA-based SoCs with a
> Microwatt core.  CONFIG_MATH_EMULATION can be selected for Microwatt
> SOCs which don't have the FPU.

The only thing I can think of is you may want to select PPC_RADIX and 
other possible things that are required, but that's not a big deal at 
the moment. I have a few kernel size reduction config patches (like 
CONFIG_PPC_HASH) I might be able to upstream now for Microwatt, so I
could do a bit of a pass over the Kconfig stuff at that point.

Reviewed-by: Nicholas Piggin 

> 
> Signed-off-by: Paul Mackerras 
> ---
>  arch/powerpc/Kconfig  |  2 +-
>  arch/powerpc/platforms/Kconfig|  1 +
>  arch/powerpc/platforms/Makefile   |  1 +
>  arch/powerpc/platforms/microwatt/Kconfig  |  9 +
>  arch/powerpc/platforms/microwatt/Makefile |  1 +
>  arch/powerpc/platforms/microwatt/setup.c  | 23 +++
>  6 files changed, 36 insertions(+), 1 deletion(-)
>  create mode 100644 arch/powerpc/platforms/microwatt/Kconfig
>  create mode 100644 arch/powerpc/platforms/microwatt/Makefile
>  create mode 100644 arch/powerpc/platforms/microwatt/setup.c
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 386ae12d8523..5ce51c38a346 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -422,7 +422,7 @@ config HUGETLB_PAGE_SIZE_VARIABLE
>  
>  config MATH_EMULATION
>   bool "Math emulation"
> - depends on 4xx || PPC_8xx || PPC_MPC832x || BOOKE
> + depends on 4xx || PPC_8xx || PPC_MPC832x || BOOKE || PPC_MICROWATT
>   select PPC_FPU_REGS
>   help
> Some PowerPC chips designed for embedded applications do not have
> diff --git a/arch/powerpc/platforms/Kconfig b/arch/powerpc/platforms/Kconfig
> index 7a5e8f4541e3..74be4d06afbf 100644
> --- a/arch/powerpc/platforms/Kconfig
> +++ b/arch/powerpc/platforms/Kconfig
> @@ -20,6 +20,7 @@ source "arch/powerpc/platforms/embedded6xx/Kconfig"
>  source "arch/powerpc/platforms/44x/Kconfig"
>  source "arch/powerpc/platforms/40x/Kconfig"
>  source "arch/powerpc/platforms/amigaone/Kconfig"
> +source "arch/powerpc/platforms/microwatt/Kconfig"
>  
>  config KVM_GUEST
>   bool "KVM Guest support"
> diff --git a/arch/powerpc/platforms/Makefile b/arch/powerpc/platforms/Makefile
> index 143d4417f6cc..edcb54cdb1a8 100644
> --- a/arch/powerpc/platforms/Makefile
> +++ b/arch/powerpc/platforms/Makefile
> @@ -22,3 +22,4 @@ obj-$(CONFIG_PPC_CELL)  += cell/
>  obj-$(CONFIG_PPC_PS3)+= ps3/
>  obj-$(CONFIG_EMBEDDED6xx)+= embedded6xx/
>  obj-$(CONFIG_AMIGAONE)   += amigaone/
> +obj-$(CONFIG_PPC_MICROWATT)  += microwatt/
> diff --git a/arch/powerpc/platforms/microwatt/Kconfig 
> b/arch/powerpc/platforms/microwatt/Kconfig
> new file mode 100644
> index ..3be01e78ce57
> --- /dev/null
> +++ b/arch/powerpc/platforms/microwatt/Kconfig
> @@ -0,0 +1,9 @@
> +# SPDX-License-Identifier: GPL-2.0
> +config PPC_MICROWATT
> + depends on PPC_BOOK3S_64 && !SMP
> + bool "Microwatt SoC platform"
> + select PPC_XICS
> + select PPC_NATIVE
> + help
> +  This option enables support for FPGA-based Microwatt 
> implementations.
> +
> diff --git a/arch/powerpc/platforms/microwatt/Makefile 
> b/arch/powerpc/platforms/microwatt/Makefile
> new file mode 100644
> index ..e6885b3b2ee7
> --- /dev/null
> +++ b/arch/powerpc/platforms/microwatt/Makefile
> @@ -0,0 +1 @@
> +obj-y+= setup.o
> diff --git a/arch/powerpc/platforms/microwatt/setup.c 
> b/arch/powerpc/platforms/microwatt/setup.c
> new file mode 100644
> index ..d80d52612672
> --- /dev/null
> +++ b/arch/powerpc/platforms/microwatt/setup.c
> @@ -0,0 +1,23 @@
> +/*
> + * Microwatt FPGA-based SoC platform setup code.
> + *
> + * Copyright 2020 Paul Mackerras (pau...@ozlabs.org), IBM Corp.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +static int __init microwatt_probe(void)
> +{
> + return of_machine_is_compatible("microwatt-soc");
> +}
> +
> +define_machine(microwatt) {
> + .name   = "microwatt",
> + .probe  = microwatt_probe,
> + .calibrate_decr = generic_calibrate_decr,
> +};
> -- 
> 2.31.1
> 
> 


Re: [PATCH v2 5/9] powerpc/microwatt: Use standard 16550 UART for console

2021-06-18 Thread Nicholas Piggin
Excerpts from Paul Mackerras's message of June 18, 2021 10:12 pm:
> On Fri, Jun 18, 2021 at 05:40:40PM +1000, Nicholas Piggin wrote:
>> Excerpts from Paul Mackerras's message of June 18, 2021 1:46 pm:
>> > From: Benjamin Herrenschmidt 
>> > 
>> > This adds support to the Microwatt platform to use the standard
>> > 16550-style UART which available in the standalone Microwatt FPGA.
>> > 
>> > Signed-off-by: Benjamin Herrenschmidt 
>> > Signed-off-by: Paul Mackerras 
> ...
>> > +#ifdef CONFIG_PPC_EARLY_DEBUG_MICROWATT
>> > +
>> > +#define UDBG_UART_MW_ADDR ((void __iomem *)0xc0002000)
>> > +
>> > +static u8 udbg_uart_in_isa300_rm(unsigned int reg)
>> > +{
>> > +  uint64_t msr = mfmsr();
>> > +  uint8_t  c;
>> > +
>> > +  mtmsr(msr & ~(MSR_EE|MSR_DR));
>> > +  isync();
>> > +  eieio();
>> > +  c = __raw_rm_readb(UDBG_UART_MW_ADDR + (reg << 2));
>> > +  mtmsr(msr);
>> > +  isync();
>> > +  return c;
>> > +}
>> 
>> Why is realmode required? No cache inhibited mappings yet?
> 
> Because it's EARLY debug, for use in the very early stages of boot
> when the kernel's radix tree may or may not have been initialized.
> The easiest way to make a function that works correctly whether or not
> the radix tree has been initialized and the MMU turned on is to
> temporarily turn off the MMU for data accesses and use lbzcix/stbcix

Ah makes sense.

> (which Microwatt has, even though it doesn't implement hypervisor
> mode).
> 
> (I don't know which "yet" you meant - "yet" in the process of booting a
> kernel, or "yet" in the process of Microwatt's development?  Microwatt
> certainly does have cache-inhibited mappings and has done since the
> MMU was first introduced.)

I did mean mappings to the UART, but good to get both answers :D

> 
> In fact the defconfig I add later in the series doesn't enable
> CONFIG_PPC_EARLY_DEBUG_MICROWATT, but it's there if it's needed for
> debugging.
> 
>> mtmsrd with L=0 is defined to be context synchronizing in isa 3, so I 
>> don't think the isync would be required. There is a bit of code around 
>> arch/powerpc that does this, maybe it used to be needed or some other
>> implementations needed it?
>> 
>> That's just for my curiosity, it doesn't really hurt to have them
>> there.
> 
> Right, and in fact mtmsrd is marked as a single-issue instruction in
> Microwatt, so it should work with no isyncs or eieios.  Presumably Ben
> copied the isync/eieio pattern from somewhere else.

Makes sense. Well I don't have any objection to the series.

Thanks,
Nick


Re: linux-next: manual merge of the akpm-current tree with the powerpc tree

2021-06-18 Thread Nicholas Piggin
Excerpts from Stephen Rothwell's message of June 18, 2021 7:44 pm:
> Hi all,
> 
> Today's linux-next merge of the akpm-current tree got a conflict in:
> 
>   arch/powerpc/kernel/smp.c
> 
> between commit:
> 
>   86f46f343272 ("powerpc/32s: Initialise KUAP and KUEP in C")
> 
> from the powerpc tree and commit:
> 
>   103e676c91d0 ("lazy tlb: introduce lazy mm refcount helper functions")
> 
> from the akpm-current tree.
> 
> I fixed it up (see below) and can carry the fix as necessary. This
> is now fixed as far as linux-next is concerned, but any non trivial
> conflicts should be mentioned to your upstream maintainer when your tree
> is submitted for merging.  You may also want to consider cooperating
> with the maintainer of the conflicting tree to minimise any particularly
> complex conflicts.

This is the correct merge.

Thanks,
Nick

> 
> -- 
> Cheers,
> Stephen Rothwell
> 
> diff --cc arch/powerpc/kernel/smp.c
> index b83a59ce9beb,b289f1d213f8..
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@@ -1541,11 -1541,7 +1541,11 @@@ void start_secondary(void *unused
>   {
>   unsigned int cpu = raw_smp_processor_id();
>   
>  +/* PPC64 calls setup_kup() in early_setup_secondary() */
>  +if (IS_ENABLED(CONFIG_PPC32))
>  +setup_kup();
>  +
> - mmgrab(&init_mm);
> + mmgrab_lazy_tlb(&init_mm);
>   current->active_mm = &init_mm;
>   
>   smp_store_cpu_info(cpu);
> 


Re: [PATCH] crypto: nx: Fix memcpy() over-reading in nonce

2021-06-18 Thread Kees Cook
On Thu, Jun 17, 2021 at 04:08:15PM +1000, Michael Ellerman wrote:
> Kees Cook  writes:
> > Fix typo in memcpy() where size should be CTR_RFC3686_NONCE_SIZE.
> >
> > Fixes: 030f4e968741 ("crypto: nx - Fix reentrancy bugs")
> > Cc: sta...@vger.kernel.org
> > Signed-off-by: Kees Cook 
> 
> Thanks.
> 
> > ---
> >  drivers/crypto/nx/nx-aes-ctr.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/crypto/nx/nx-aes-ctr.c b/drivers/crypto/nx/nx-aes-ctr.c
> > index 13f518802343..6120e350ff71 100644
> > --- a/drivers/crypto/nx/nx-aes-ctr.c
> > +++ b/drivers/crypto/nx/nx-aes-ctr.c
> > @@ -118,7 +118,7 @@ static int ctr3686_aes_nx_crypt(struct skcipher_request 
> > *req)
> > struct nx_crypto_ctx *nx_ctx = crypto_skcipher_ctx(tfm);
> > u8 iv[16];
> >  
> > -   memcpy(iv, nx_ctx->priv.ctr.nonce, CTR_RFC3686_IV_SIZE);
> > +   memcpy(iv, nx_ctx->priv.ctr.nonce, CTR_RFC3686_NONCE_SIZE);
> > memcpy(iv + CTR_RFC3686_NONCE_SIZE, req->iv, CTR_RFC3686_IV_SIZE);
> > iv[12] = iv[13] = iv[14] = 0;
> > iv[15] = 1;
> 
> Where IV_SIZE is 8 and NONCE_SIZE is 4.
> 
> And iv is 16 bytes, so it's not a buffer overflow.
> 
> But priv.ctr.nonce is 4 bytes, and at the end of the struct, so it reads
> 4 bytes past the end of the nx_crypto_ctx, which is not good.
> 
> But then immediately overwrites whatever it read with req->iv.
> 
> So seems pretty harmless in practice?

Right -- there's no damage done, but future memcpy() FORTIFY work alerts
on this, so I'm going through cleaning all of these up. :)

-- 
Kees Cook


[Bug 213079] [bisected] IRQ problems and crashes on a PowerMac G5 with 5.12.3

2021-06-18 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=213079

--- Comment #11 from Erhard F. (erhar...@mailbox.org) ---
Created attachment 297473
  --> https://bugzilla.kernel.org/attachment.cgi?id=297473&action=edit
dmesg (5.13-rc6 + DEBUG_VM_PGTABLE, PowerMac G5 11,2)

The trace got some additional data with DEBUG_VM_PGTABLE=y, slub_debug=P and
page_poison=1:

[...]
irq 63: nobody cared (try booting with the "irqpoll" option)
Call Trace:
CPU: 0 PID: 0 Comm: swapper/0 Tainted: GW
5.13.0-rc6-PowerMacG5+ #2
[cfff7ae0] [c054eafc] .dump_stack+0xe0/0x13c (unreliable)
[cfff7b80] [c00e1428] .__report_bad_irq+0x34/0xf0
[cfff7c20] [c00e1310] .note_interrupt+0x258/0x300
[cfff7ce0] [c00dd58c] .handle_irq_event_percpu+0x64/0x90
[cfff7d70] [c00dd5fc] .handle_irq_event+0x44/0x70
[cfff7e00] [c00e2a14] .handle_fasteoi_irq+0xac/0x158
[cfff7ea0] [c00dc648] .generic_handle_irq+0x38/0x58
[cfff7f10] [c0011688] .__do_irq+0x15c/0x238
[cfff7f90] [c001207c] .do_IRQ+0x180/0x188
[c12db810] [c0011f9c] .do_IRQ+0xa0/0x188
[c12db8b0] [c0007f94]
hardware_interrupt_common_virt+0x1a4/0x1b0
--- interrupt: 500 at .power4_idle_nap+0x30/0x34
NIP:  c002cc04 LR: c0016828 CTR: c0016768
REGS: c12db920 TRAP: 0500   Tainted: GW 
(5.13.0-rc6-PowerMacG5+)
MSR:  90009032   CR: 44082242  XER: 
IRQMASK: 0 
GPR00: c00167dc c12dbbc0 c12df700 0001 
GPR04:   0002 90049032 
GPR08: 0001 c11b3b80 0001 0016 
GPR12: 44082242 c23a6000 0014aa88 ffb30100 
GPR16: 01e7b8da 01e7bd5f 01e7b9f0 01e88d8d 
GPR20: 01e7bd3d 01e7b98b 01e7bbb2 01e7b89c 
GPR24: 0270f700 c1081008 c0a7c02d  
GPR28: c12edb9c c11b3b80 90009032 c12ed985 
NIP [c002cc04] .power4_idle_nap+0x30/0x34
LR [c0016828] .power4_idle+0xc0/0xe8
--- interrupt: 500
[c12dbbc0] [c00167dc] .power4_idle+0x74/0xe8 (unreliable)
handlers:
[c12dbc40] [c001665c] .arch_cpu_idle+0x80/0x18c
[c12dbcc0] [c081f058] .default_idle_call+0x7c/0xd0
[c12dbd30] [c00a7bcc] .do_idle+0x128/0x140
[c12dbdd0] [c00a7eb4] .cpu_startup_entry+0x28/0x2c
[c12dbe40] [c0010044] .rest_init+0x1b0/0x1bc
[c12dbec0] [c10047f4] .start_kernel+0x934/0x9b8
[c12dbf90] [c000b390] start_here_common+0x1c/0x8c
[<1553d54b>] .nvme_irq
[<1553d54b>] .nvme_irq
Disabling IRQ #63

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

[Bug 213069] kernel BUG at arch/powerpc/include/asm/book3s/64/hash-4k.h:147! Oops: Exception in kernel mode, sig: 5 [#1]

2021-06-18 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=213069

Erhard F. (erhar...@mailbox.org) changed:

   What|Removed |Added

 Status|REOPENED|RESOLVED
 Resolution|--- |PATCH_ALREADY_AVAILABLE

--- Comment #5 from Erhard F. (erhar...@mailbox.org) ---
Works now with Anshuman's patch applied:
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=c0dd65c2e295762fc5ebd11f8da3ac33f1d30788

Thanks!

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH v4 07/11] powerpc/pseries/iommu: Reorganize iommu_table_setparms*() with new helper

2021-06-18 Thread Leonardo Brás
Hello Alexey, thanks for this feedback!

On Mon, 2021-05-10 at 17:34 +1000, Alexey Kardashevskiy wrote:
> 
> 
> This does not apply anymore as it conflicts with my 4be518d838809e2135.

ok, rebasing on top of torvalds/master

> 
> 
> > ---
> >   arch/powerpc/platforms/pseries/iommu.c | 100 --
> > ---
> >   1 file changed, 50 insertions(+), 50 deletions(-)
> > 
> > diff --git a/arch/powerpc/platforms/pseries/iommu.c
> > b/arch/powerpc/platforms/pseries/iommu.c
> > index 5a70ecd579b8..89cb6e9e9f31 100644
> > --- a/arch/powerpc/platforms/pseries/iommu.c
> > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > @@ -53,6 +53,11 @@ enum {
> > DDW_EXT_QUERY_OUT_SIZE = 2
> >   };
> >   
> > +#ifdef CONFIG_IOMMU_API
> > +static int tce_exchange_pseries(struct iommu_table *tbl, long index,
> > unsigned long *tce,
> > +   enum dma_data_direction *direction,
> > bool realmode);
> > +#endif
> 
> 
> Instead of declaring this so far from the the code which needs it, may 
> be add
> 
> struct iommu_table_ops iommu_table_lpar_multi_ops;
> 
> right before iommu_table_setparms() (as the sctruct is what you
> actually 
> want there),
>  and you won't need to move iommu_table_pseries_ops as well.

Oh, I was not aware I could declare a variable and then define it
statically. 
I mean, it does make sense, but I never thought of that.

I will change that :)

> 
> 
> > +
> >   static struct iommu_table *iommu_pseries_alloc_table(int node)
> >   {
> > struct iommu_table *tbl;
> > @@ -501,6 +506,28 @@ static int
> > tce_setrange_multi_pSeriesLP_walk(unsigned long start_pfn,
> > return tce_setrange_multi_pSeriesLP(start_pfn, num_pfn, arg);
> >   }
> >   
> > +static inline void _iommu_table_setparms(struct iommu_table *tbl,
> > unsigned long busno,
> 
> 
> The underscore is confusing, may be iommu_table_do_setparms()? 
> iommu_table_setparms_common()? Not sure. I cannot recall a single 
> function with just one leading underscore, I suspect I was pushed back 
> when I tried adding one ages ago :) "inline" seems excessive, the 
> compiler will probably figure it out anyway.
> 
> 

sure, done.


> 
> > +    unsigned long liobn,
> > unsigned long win_addr,
> > +    unsigned long window_size,
> > unsigned long page_shift,
> > +    unsigned long base, struct
> > iommu_table_ops *table_ops)
> 
> 
> Make "base" a pointer. Or, better, just keep setting it directly in 
> iommu_table_setparms() rather than passing 0 around.
> 
> The same comment about "liobn" - set it in iommu_table_setparms_lpar().
> The reviewer will see what field atters in what situation imho.
> 

The idea here was to keep all tbl parameters setting in
_iommu_table_setparms (or iommu_table_setparms_common).

I understand the idea that each one of those is optional in the other
case, but should we keep whatever value is present in the other
variable (not zeroing the other variable), or do someting like:

tbl->it_index = 0;
tbl->it_base = basep;
(in iommu_table_setparms)

tbl->it_index = liobn;
tbl->it_base = 0;
(in iommu_table_setparms_lpar)


> 
> > +{
> > +   tbl->it_busno = busno;
> > +   tbl->it_index = liobn;
> > +   tbl->it_offset = win_addr >> page_shift;
> > +   tbl->it_size = window_size >> page_shift;
> > +   tbl->it_page_shift = page_shift;
> > +   tbl->it_base = base;
> > +   tbl->it_blocksize = 16;
> > +   tbl->it_type = TCE_PCI;
> > +   tbl->it_ops = table_ops;
> > +}
> > +
> > +struct iommu_table_ops iommu_table_pseries_ops = {
> > +   .set = tce_build_pSeries,
> > +   .clear = tce_free_pSeries,
> > +   .get = tce_get_pseries
> > +};
> > +
> >   static void iommu_table_setparms(struct pci_controller *phb,
> >  struct device_node *dn,
> >  struct iommu_table *tbl)
> > @@ -509,8 +536,13 @@ static void iommu_table_setparms(struct
> > pci_controller *phb,
> > const unsigned long *basep;
> > const u32 *sizep;
> >   
> > -   node = phb->dn;
> > +   /* Test if we are going over 2GB of DMA space */
> > +   if (phb->dma_window_base_cur + phb->dma_window_size > SZ_2G)
> > {
> > +   udbg_printf("PCI_DMA: Unexpected number of IOAs under
> > this PHB.\n");
> > +   panic("PCI_DMA: Unexpected number of IOAs under this
> > PHB.\n");
> > +   }
> >   
> > +   node = phb->dn;
> > basep = of_get_property(node, "linux,tce-base", NULL);
> > sizep = of_get_property(node, "linux,tce-size", NULL);
> > if (basep == NULL || sizep == NULL) {
> > @@ -519,33 +551,25 @@ static void iommu_table_setparms(struct
> > pci_controller *phb,
> > return;
> > }
> >   
> > -   tbl->it_base = (unsigned long)__va(*basep);
> > +   _iommu_table_setparms(tbl, phb->bus->number, 0, phb-
> > >dma_window_base_cur,
>

Re: [PATCH v3] lockdown,selinux: fix wrong subject in some SELinux lockdown checks

2021-06-18 Thread Dan Williams
On Wed, Jun 16, 2021 at 1:51 AM Ondrej Mosnacek  wrote:
>
> Commit 59438b46471a ("security,lockdown,selinux: implement SELinux
> lockdown") added an implementation of the locked_down LSM hook to
> SELinux, with the aim to restrict which domains are allowed to perform
> operations that would breach lockdown.
>
> However, in several places the security_locked_down() hook is called in
> situations where the current task isn't doing any action that would
> directly breach lockdown, leading to SELinux checks that are basically
> bogus.
>
> To fix this, add an explicit struct cred pointer argument to
> security_lockdown() and define NULL as a special value to pass instead
> of current_cred() in such situations. LSMs that take the subject
> credentials into account can then fall back to some default or ignore
> such calls altogether. In the SELinux lockdown hook implementation, use
> SECINITSID_KERNEL in case the cred argument is NULL.
>
> Most of the callers are updated to pass current_cred() as the cred
> pointer, thus maintaining the same behavior. The following callers are
> modified to pass NULL as the cred pointer instead:
> 1. arch/powerpc/xmon/xmon.c
>  Seems to be some interactive debugging facility. It appears that
>  the lockdown hook is called from interrupt context here, so it
>  should be more appropriate to request a global lockdown decision.
> 2. fs/tracefs/inode.c:tracefs_create_file()
>  Here the call is used to prevent creating new tracefs entries when
>  the kernel is locked down. Assumes that locking down is one-way -
>  i.e. if the hook returns non-zero once, it will never return zero
>  again, thus no point in creating these files. Also, the hook is
>  often called by a module's init function when it is loaded by
>  userspace, where it doesn't make much sense to do a check against
>  the current task's creds, since the task itself doesn't actually
>  use the tracing functionality (i.e. doesn't breach lockdown), just
>  indirectly makes some new tracepoints available to whoever is
>  authorized to use them.
> 3. net/xfrm/xfrm_user.c:copy_to_user_*()
>  Here a cryptographic secret is redacted based on the value returned
>  from the hook. There are two possible actions that may lead here:
>  a) A netlink message XFRM_MSG_GETSA with NLM_F_DUMP set - here the
> task context is relevant, since the dumped data is sent back to
> the current task.
>  b) When adding/deleting/updating an SA via XFRM_MSG_xxxSA, the
> dumped SA is broadcasted to tasks subscribed to XFRM events -
> here the current task context is not relevant as it doesn't
> represent the tasks that could potentially see the secret.
>  It doesn't seem worth it to try to keep using the current task's
>  context in the a) case, since the eventual data leak can be
>  circumvented anyway via b), plus there is no way for the task to
>  indicate that it doesn't care about the actual key value, so the
>  check could generate a lot of "false alert" denials with SELinux.
>  Thus, let's pass NULL instead of current_cred() here faute de
>  mieux.
>
> Improvements-suggested-by: Casey Schaufler 
> Improvements-suggested-by: Paul Moore 
> Fixes: 59438b46471a ("security,lockdown,selinux: implement SELinux lockdown")
> Signed-off-by: Ondrej Mosnacek 
[..]
> diff --git a/drivers/cxl/mem.c b/drivers/cxl/mem.c
> index 2acc6173da36..c1747b6555c7 100644
> --- a/drivers/cxl/mem.c
> +++ b/drivers/cxl/mem.c
> @@ -568,7 +568,7 @@ static bool cxl_mem_raw_command_allowed(u16 opcode)
> if (!IS_ENABLED(CONFIG_CXL_MEM_RAW_COMMANDS))
> return false;
>
> -   if (security_locked_down(LOCKDOWN_NONE))
> +   if (security_locked_down(current_cred(), LOCKDOWN_NONE))

Acked-by: Dan Williams 

...however that usage looks wrong. The expectation is that if kernel
integrity protections are enabled then raw command access should be
disabled. So I think that should be equivalent to LOCKDOWN_PCI_ACCESS
in terms of the command capabilities to filter.


Re: [PATCH 1/1] powerpc/pseries: Interface to represent PAPR firmware attributes

2021-06-18 Thread Fabiano Rosas
"Pratik R. Sampat"  writes:

> Adds a generic interface to represent the energy and frequency related
> PAPR attributes on the system using the new H_CALL
> "H_GET_ENERGY_SCALE_INFO".
>
> H_GET_EM_PARMS H_CALL was previously responsible for exporting this
> information in the lparcfg, however the H_GET_EM_PARMS H_CALL
> will be deprecated P10 onwards.
>
> The H_GET_ENERGY_SCALE_INFO H_CALL is of the following call format:
> hcall(
>   uint64 H_GET_ENERGY_SCALE_INFO,  // Get energy scale info
>   uint64 flags,   // Per the flag request
>   uint64 firstAttributeId,// The attribute id
>   uint64 bufferAddress,   // Guest physical address of the output buffer
>   uint64 bufferSize   // The size in bytes of the output buffer
> );
>
> This H_CALL can query either all the attributes at once with
> firstAttributeId = 0, flags = 0 as well as query only one attribute
> at a time with firstAttributeId = id
>
> The output buffer consists of the following
> 1. number of attributes  - 8 bytes
> 2. array offset to the data location - 8 bytes
> 3. version info  - 1 byte
> 4. A data array of size num attributes, which contains the following:
>   a. attribute ID  - 8 bytes
>   b. attribute value in number - 8 bytes
>   c. attribute name in string  - 64 bytes
>   d. attribute value in string - 64 bytes
>
> The new H_CALL exports information in direct string value format, hence
> a new interface has been introduced in
> /sys/firmware/papr/energy_scale_info to export this information to
> userspace in an extensible pass-through format.
>
> The H_CALL returns the name, numeric value and string value (if exists)
>
> The format of exposing the sysfs information is as follows:
> /sys/firmware/papr/energy_scale_info/
>|-- /
>  |-- desc
>  |-- value
>  |-- value_desc (if exists)
>|-- /
>  |-- desc
>  |-- value
>  |-- value_desc (if exists)
> ...
>
> The energy information that is exported is useful for userspace tools
> such as powerpc-utils. Currently these tools infer the
> "power_mode_data" value in the lparcfg, which in turn is obtained from
> the to be deprecated H_GET_EM_PARMS H_CALL.
> On future platforms, such userspace utilities will have to look at the
> data returned from the new H_CALL being populated in this new sysfs
> interface and report this information directly without the need of
> interpretation.
>
> Signed-off-by: Pratik R. Sampat 
> ---
>  .../sysfs-firmware-papr-energy-scale-info |  26 ++
>  arch/powerpc/include/asm/hvcall.h |  21 +-
>  arch/powerpc/kvm/trace_hv.h   |   1 +
>  arch/powerpc/platforms/pseries/Makefile   |   3 +-
>  .../pseries/papr_platform_attributes.c| 292 ++
>  5 files changed, 341 insertions(+), 2 deletions(-)
>  create mode 100644 
> Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
>  create mode 100644 arch/powerpc/platforms/pseries/papr_platform_attributes.c
>
> diff --git a/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info 
> b/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
> new file mode 100644
> index ..499bc1ae173a
> --- /dev/null
> +++ b/Documentation/ABI/testing/sysfs-firmware-papr-energy-scale-info
> @@ -0,0 +1,26 @@
> +What:/sys/firmware/papr/energy_scale_info
> +Date:June 2021
> +Contact: Linux for PowerPC mailing list 
> +Description: Director hosting a set of platform attributes on Linux
> + running as a PAPR guest.

This still refers to papr attributes. We want energy/frequency, etc. instead.

> +
> + Each file in a directory contains a platform
> + attribute hierarchy pertaining to performance/
> + energy-savings mode and processor frequency.
> +
> +What:/sys/firmware/papr/energy_scale_info/
> + /sys/firmware/papr/energy_scale_info//desc
> + /sys/firmware/papr/energy_scale_info//value
> + /sys/firmware/papr/energy_scale_info//value_desc
> +Date:June 2021
> +Contact: Linux for PowerPC mailing list 
> +Description: PAPR attributes directory for POWERVM servers

Same here.

> +
> + This directory provides PAPR information. It

And here.

> + contains below sysfs attributes:
> +
> + - desc: File contains the name of attribute 

desc: String description of the attribute 

> +
> + - value: Numeric value of attribute 
> +
> + - value_desc: String value of attribute 
> diff --git a/arch/powerpc/include/asm/hvcall.h 
> b/arch/powerpc/include/asm/hvcall.h
> index e3b29eda8074..19a2a8c77a49 100644
> --- a/arch/powerpc/include/asm/hvcall.h
> +++ b/arch/powerpc/include/asm/hvcall.h
> @@ -316,7 +316,8 @@
>  #define H_SCM_PERFORMANCE_STATS 0x418
>  #define H_RPT_INVALIDATE 0x448
>  #define H_SCM_FLUSH  0x44C
> -#define MAX_HCALL_OPCODE H_SCM_FLUSH
> +#define H

Re: [PATCH 8/8] membarrier: Rewrite sync_core_before_usermode() and improve documentation

2021-06-18 Thread Mathieu Desnoyers
- On Jun 18, 2021, at 3:58 PM, Andy Lutomirski l...@kernel.org wrote:

> On Fri, Jun 18, 2021, at 9:31 AM, Mathieu Desnoyers wrote:
>> - On Jun 17, 2021, at 8:12 PM, Andy Lutomirski l...@kernel.org wrote:
>> 
>> > On 6/17/21 7:47 AM, Mathieu Desnoyers wrote:
>> > 
>> >> Please change back this #ifndef / #else / #endif within function for
>> >> 
>> >> if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE)) {
>> >>   ...
>> >> } else {
>> >>   ...
>> >> }
>> >> 
>> >> I don't think mixing up preprocessor and code logic makes it more 
>> >> readable.
>> > 
>> > I agree, but I don't know how to make the result work well.
>> > membarrier_sync_core_before_usermode() isn't defined in the !IS_ENABLED
>> > case, so either I need to fake up a definition or use #ifdef.
>> > 
>> > If I faked up a definition, I would want to assert, at build time, that
>> > it isn't called.  I don't think we can do:
>> > 
>> > static void membarrier_sync_core_before_usermode()
>> > {
>> >BUILD_BUG_IF_REACHABLE();
>> > }
>> 
>> Let's look at the context here:
>> 
>> static void ipi_sync_core(void *info)
>> {
>> []
>> membarrier_sync_core_before_usermode()
>> }
>> 
>> ^ this can be within #ifdef / #endif
>> 
>> static int membarrier_private_expedited(int flags, int cpu_id)
>> [...]
>>if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE))
>> return -EINVAL;
>> if (!(atomic_read(&mm->membarrier_state) &
>>   MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY))
>> return -EPERM;
>> ipi_func = ipi_sync_core;
>> 
>> All we need to make the line above work is to define an empty ipi_sync_core
>> function in the #else case after the ipi_sync_core() function definition.
>> 
>> Or am I missing your point ?
> 
> Maybe?
> 
> My objection is that an empty ipi_sync_core is a lie — it doesn’t sync the 
> core.
> I would be fine with that if I could have the compiler statically verify that
> it’s not called, but I’m uncomfortable having it there if the implementation 
> is
> actively incorrect.

I see. Another approach would be to implement a "setter" function to populate
"ipi_func". That setter function would return -EINVAL in its #ifndef 
CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE
implementation.

Would that be better ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [PATCH 8/8] membarrier: Rewrite sync_core_before_usermode() and improve documentation

2021-06-18 Thread Andy Lutomirski



On Fri, Jun 18, 2021, at 9:31 AM, Mathieu Desnoyers wrote:
> - On Jun 17, 2021, at 8:12 PM, Andy Lutomirski l...@kernel.org wrote:
> 
> > On 6/17/21 7:47 AM, Mathieu Desnoyers wrote:
> > 
> >> Please change back this #ifndef / #else / #endif within function for
> >> 
> >> if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE)) {
> >>   ...
> >> } else {
> >>   ...
> >> }
> >> 
> >> I don't think mixing up preprocessor and code logic makes it more readable.
> > 
> > I agree, but I don't know how to make the result work well.
> > membarrier_sync_core_before_usermode() isn't defined in the !IS_ENABLED
> > case, so either I need to fake up a definition or use #ifdef.
> > 
> > If I faked up a definition, I would want to assert, at build time, that
> > it isn't called.  I don't think we can do:
> > 
> > static void membarrier_sync_core_before_usermode()
> > {
> >BUILD_BUG_IF_REACHABLE();
> > }
> 
> Let's look at the context here:
> 
> static void ipi_sync_core(void *info)
> {
> []
> membarrier_sync_core_before_usermode()
> }
> 
> ^ this can be within #ifdef / #endif
> 
> static int membarrier_private_expedited(int flags, int cpu_id)
> [...]
>if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE))
> return -EINVAL;
> if (!(atomic_read(&mm->membarrier_state) &
>   MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY))
> return -EPERM;
> ipi_func = ipi_sync_core;
> 
> All we need to make the line above work is to define an empty ipi_sync_core
> function in the #else case after the ipi_sync_core() function definition.
> 
> Or am I missing your point ?

Maybe?

My objection is that an empty ipi_sync_core is a lie — it doesn’t sync the 
core.  I would be fine with that if I could have the compiler statically verify 
that it’s not called, but I’m uncomfortable having it there if the 
implementation is actively incorrect.


Re: [PATCH 01/18] mm: add a kunmap_local_dirty helper

2021-06-18 Thread Ira Weiny
On Fri, Jun 18, 2021 at 11:37:28AM +0800, Herbert Xu wrote:
> On Thu, Jun 17, 2021 at 08:01:57PM -0700, Ira Weiny wrote:
> >
> > > + flush_kernel_dcache_page(__page);   \
> > 
> > Is this required on 32bit systems?  Why is kunmap_flush_on_unmap() not
> > sufficient on 64bit systems?  The normal kunmap_local() path does that.
> > 
> > I'm sorry but I did not see a conclusion to my query on V1. Herbert implied 
> > the
> > he just copied from the crypto code.[1]  I'm concerned that this _dirty() 
> > call
> > is just going to confuse the users of kmap even more.  So why can't we get 
> > to
> > the bottom of why flush_kernel_dcache_page() needs so much logic around it
> > before complicating the general kernel users.
> > 
> > I would like to see it go away if possible.
> 
> This thread may be related:
> 
> https://lwn.net/Articles/240249/

Interesting!  Thanks!

Digging around a bit more I found:

https://lore.kernel.org/patchwork/patch/439637/

Auditing all the flush_dcache_page() arch code reveals that the mapping field
is either unused, or is checked for NULL.  Furthermore, all the implementations
call page_mapping_file() which further limits the page to not be a swap page.

All flush_kernel_dcache_page() implementations appears to operate the same way
in all arch's which define that call.

So I'm confident now that additional !PageSlab(__page) checks are not needed
and this patch is unnecessary.   Christoph, can we leave this out of the kmap
API and just fold the flush_kernel_dcache_page() calls back into the bvec code?

Unfortunately, I'm not convinced this can be handled completely by
kunmap_local() nor the mem*_page() calls because there is a difference between
flush_dcache_page() and flush_kernel_dcache_page() in most archs...  [parisc
being an exception which falls back to flush_kernel_dcache_page()]...

It seems like the generic unmap path _should_ be able to determine which call
to make based on the page but I'd have to look at that more.

Ira


Re: [PATCH for 4.16 v7 02/11] powerpc: membarrier: Skip memory barrier in switch_mm()

2021-06-18 Thread Mathieu Desnoyers
- On Jun 18, 2021, at 1:13 PM, Christophe Leroy christophe.le...@csgroup.eu 
wrote:
[...]
> 
> I don't understand all that complexity to just replace a simple
> 'smp_mb__after_unlock_lock()'.
> 
> #define smp_mb__after_unlock_lock()   smp_mb()
> #define smp_mb()  barrier()
> # define barrier() __asm__ __volatile__("": : :"memory")
> 
> 
> Am I missing some subtility ?

On powerpc CONFIG_SMP, smp_mb() is actually defined as:

#define smp_mb()__smp_mb()
#define __smp_mb()  mb()
#define mb()   __asm__ __volatile__ ("sync" : : : "memory")

So the original motivation here was to skip a "sync" instruction whenever
switching between threads which are part of the same process. But based on
recent discussions, I suspect my implementation may be inaccurately doing
so though.

Thanks,

Mathieu


> 
> Thanks
> Christophe

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [PATCH for 4.16 v7 02/11] powerpc: membarrier: Skip memory barrier in switch_mm()

2021-06-18 Thread Christophe Leroy




Le 29/01/2018 à 21:20, Mathieu Desnoyers a écrit :

Allow PowerPC to skip the full memory barrier in switch_mm(), and
only issue the barrier when scheduling into a task belonging to a
process that has registered to use expedited private.

Threads targeting the same VM but which belong to different thread
groups is a tricky case. It has a few consequences:

It turns out that we cannot rely on get_nr_threads(p) to count the
number of threads using a VM. We can use
(atomic_read(&mm->mm_users) == 1 && get_nr_threads(p) == 1)
instead to skip the synchronize_sched() for cases where the VM only has
a single user, and that user only has a single thread.

It also turns out that we cannot use for_each_thread() to set
thread flags in all threads using a VM, as it only iterates on the
thread group.

Therefore, test the membarrier state variable directly rather than
relying on thread flags. This means
membarrier_register_private_expedited() needs to set the
MEMBARRIER_STATE_PRIVATE_EXPEDITED flag, issue synchronize_sched(), and
only then set MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY which allows
private expedited membarrier commands to succeed.
membarrier_arch_switch_mm() now tests for the
MEMBARRIER_STATE_PRIVATE_EXPEDITED flag.


Looking at switch_mm_irqs_off(), I found it more complex than expected and found that this patch is 
the reason for that complexity.


Before the patch (ie in kernel 4.14), we have:

 :
   0:   81 24 01 c8 lwz r9,456(r4)
   4:   71 29 00 01 andi.   r9,r9,1
   8:   40 82 00 1c bne 24 
   c:   39 24 01 c8 addir9,r4,456
  10:   39 40 00 01 li  r10,1
  14:   7d 00 48 28 lwarx   r8,0,r9
  18:   7d 08 53 78 or  r8,r8,r10
  1c:   7d 00 49 2d stwcx.  r8,0,r9
  20:   40 c2 ff f4 bne-14 
  24:   7c 04 18 40 cmplw   r4,r3
  28:   81 24 00 24 lwz r9,36(r4)
  2c:   91 25 04 4c stw r9,1100(r5)
  30:   4d 82 00 20 beqlr
  34:   48 00 00 00 b   34 
34: R_PPC_REL24 switch_mmu_context


After the patch (ie in 5.13-rc6), that now is:

 :
   0:   81 24 02 18 lwz r9,536(r4)
   4:   71 29 00 01 andi.   r9,r9,1
   8:   41 82 00 24 beq 2c 
   c:   7c 04 18 40 cmplw   r4,r3
  10:   81 24 00 24 lwz r9,36(r4)
  14:   91 25 04 d0 stw r9,1232(r5)
  18:   4d 82 00 20 beqlr
  1c:   81 24 00 28 lwz r9,40(r4)
  20:   71 29 00 0a andi.   r9,r9,10
  24:   40 82 00 34 bne 58 
  28:   48 00 00 00 b   28 
28: R_PPC_REL24 switch_mmu_context
  2c:   39 24 02 18 addir9,r4,536
  30:   39 40 00 01 li  r10,1
  34:   7d 00 48 28 lwarx   r8,0,r9
  38:   7d 08 53 78 or  r8,r8,r10
  3c:   7d 00 49 2d stwcx.  r8,0,r9
  40:   40 a2 ff f4 bne 34 
  44:   7c 04 18 40 cmplw   r4,r3
  48:   81 24 00 24 lwz r9,36(r4)
  4c:   91 25 04 d0 stw r9,1232(r5)
  50:   4d 82 00 20 beqlr
  54:   48 00 00 00 b   54 
54: R_PPC_REL24 switch_mmu_context
  58:   2c 03 00 00 cmpwi   r3,0
  5c:   41 82 ff cc beq 28 
  60:   48 00 00 00 b   60 
60: R_PPC_REL24 switch_mmu_context


Especially, the comparison of 'prev' to 0 is pointless as both cases end up with just branching to 
'switch_mmu_context'


I don't understand all that complexity to just replace a simple 
'smp_mb__after_unlock_lock()'.

#define smp_mb__after_unlock_lock() smp_mb()
#define smp_mb()barrier()
# define barrier() __asm__ __volatile__("": : :"memory")


Am I missing some subtility ?

Thanks
Christophe


Re: [PATCH 8/8] membarrier: Rewrite sync_core_before_usermode() and improve documentation

2021-06-18 Thread Mathieu Desnoyers
- On Jun 17, 2021, at 8:12 PM, Andy Lutomirski l...@kernel.org wrote:

> On 6/17/21 7:47 AM, Mathieu Desnoyers wrote:
> 
>> Please change back this #ifndef / #else / #endif within function for
>> 
>> if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE)) {
>>   ...
>> } else {
>>   ...
>> }
>> 
>> I don't think mixing up preprocessor and code logic makes it more readable.
> 
> I agree, but I don't know how to make the result work well.
> membarrier_sync_core_before_usermode() isn't defined in the !IS_ENABLED
> case, so either I need to fake up a definition or use #ifdef.
> 
> If I faked up a definition, I would want to assert, at build time, that
> it isn't called.  I don't think we can do:
> 
> static void membarrier_sync_core_before_usermode()
> {
>BUILD_BUG_IF_REACHABLE();
> }

Let's look at the context here:

static void ipi_sync_core(void *info)
{
[]
membarrier_sync_core_before_usermode()
}

^ this can be within #ifdef / #endif

static int membarrier_private_expedited(int flags, int cpu_id)
[...]
   if (!IS_ENABLED(CONFIG_ARCH_HAS_MEMBARRIER_SYNC_CORE))
return -EINVAL;
if (!(atomic_read(&mm->membarrier_state) &
  MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY))
return -EPERM;
ipi_func = ipi_sync_core;

All we need to make the line above work is to define an empty ipi_sync_core
function in the #else case after the ipi_sync_core() function definition.

Or am I missing your point ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com


Re: [PATCH 8/8] membarrier: Rewrite sync_core_before_usermode() and improve documentation

2021-06-18 Thread Christophe Leroy




Le 16/06/2021 à 20:52, Andy Lutomirski a écrit :

On 6/15/21 9:45 PM, Nicholas Piggin wrote:

Excerpts from Andy Lutomirski's message of June 16, 2021 1:21 pm:

The old sync_core_before_usermode() comments suggested that a non-icache-syncing
return-to-usermode instruction is x86-specific and that all other
architectures automatically notice cross-modified code on return to
userspace.



+/*
+ * XXX: can a powerpc person put an appropriate comment here?
+ */
+static inline void membarrier_sync_core_before_usermode(void)
+{
+}
+
+#endif /* _ASM_POWERPC_SYNC_CORE_H */


powerpc's can just go in asm/membarrier.h


$ ls arch/powerpc/include/asm/membarrier.h
ls: cannot access 'arch/powerpc/include/asm/membarrier.h': No such file
or directory


https://github.com/torvalds/linux/blob/master/arch/powerpc/include/asm/membarrier.h


Was added by 
https://github.com/torvalds/linux/commit/3ccfebedd8cf54e291c809c838d8ad5cc00f5688






/*
  * The RFI family of instructions are context synchronising, and
  * that is how we return to userspace, so nothing is required here.
  */


Thanks!



Re: [PATCH v13 01/12] swiotlb: Refactor swiotlb init functions

2021-06-18 Thread Christoph Hellwig
On Fri, Jun 18, 2021 at 09:09:17AM -0500, Tom Lendacky wrote:
> > swiotlb_init_with_tbl uses memblock_alloc to allocate the io_tlb_mem
> > and memblock_alloc[1] will do memset in memblock_alloc_try_nid[2], so
> > swiotlb_init_with_tbl is also good.
> > I'm happy to add the memset in swiotlb_init_io_tlb_mem if you think
> > it's clearer and safer.
> 
> On x86, if the memset is done before set_memory_decrypted() and memory
> encryption is active, then the memory will look like ciphertext afterwards
> and not be zeroes. If zeroed memory is required, then a memset must be
> done after the set_memory_decrypted() calls.

Which should be fine - we don't care that the memory is cleared to 0,
just that it doesn't leak other data.  Maybe a comment would be useful,
though,


Re: [PATCH v13 01/12] swiotlb: Refactor swiotlb init functions

2021-06-18 Thread Tom Lendacky
On 6/18/21 1:25 AM, Claire Chang wrote:
> On Fri, Jun 18, 2021 at 7:30 AM Stefano Stabellini
>  wrote:
>>
>> On Thu, 17 Jun 2021, Claire Chang wrote:
>>> Add a new function, swiotlb_init_io_tlb_mem, for the io_tlb_mem struct
>>> initialization to make the code reusable.
>>>
>>> Signed-off-by: Claire Chang 
>>> Reviewed-by: Christoph Hellwig 
>>> Tested-by: Stefano Stabellini 
>>> Tested-by: Will Deacon 
>>> ---
>>>  kernel/dma/swiotlb.c | 50 ++--
>>>  1 file changed, 25 insertions(+), 25 deletions(-)
>>>
>>> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
>>> index 52e2ac526757..47bb2a766798 100644
>>> --- a/kernel/dma/swiotlb.c
>>> +++ b/kernel/dma/swiotlb.c
>>> @@ -168,9 +168,28 @@ void __init swiotlb_update_mem_attributes(void)
>>>   memset(vaddr, 0, bytes);
>>>  }
>>>
>>> -int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int 
>>> verbose)
>>> +static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t 
>>> start,
>>> + unsigned long nslabs, bool late_alloc)
>>>  {
>>> + void *vaddr = phys_to_virt(start);
>>>   unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
>>> +
>>> + mem->nslabs = nslabs;
>>> + mem->start = start;
>>> + mem->end = mem->start + bytes;
>>> + mem->index = 0;
>>> + mem->late_alloc = late_alloc;
>>> + spin_lock_init(&mem->lock);
>>> + for (i = 0; i < mem->nslabs; i++) {
>>> + mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
>>> + mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
>>> + mem->slots[i].alloc_size = 0;
>>> + }
>>> + memset(vaddr, 0, bytes);
>>> +}
>>> +
>>> +int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int 
>>> verbose)
>>> +{
>>>   struct io_tlb_mem *mem;
>>>   size_t alloc_size;
>>>
>>> @@ -186,16 +205,8 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned 
>>> long nslabs, int verbose)
>>>   if (!mem)
>>>   panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
>>> __func__, alloc_size, PAGE_SIZE);
>>> - mem->nslabs = nslabs;
>>> - mem->start = __pa(tlb);
>>> - mem->end = mem->start + bytes;
>>> - mem->index = 0;
>>> - spin_lock_init(&mem->lock);
>>> - for (i = 0; i < mem->nslabs; i++) {
>>> - mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
>>> - mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
>>> - mem->slots[i].alloc_size = 0;
>>> - }
>>> +
>>> + swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
>>>
>>>   io_tlb_default_mem = mem;
>>>   if (verbose)
>>> @@ -282,8 +293,8 @@ swiotlb_late_init_with_default_size(size_t default_size)
>>>  int
>>>  swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
>>>  {
>>> - unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
>>>   struct io_tlb_mem *mem;
>>> + unsigned long bytes = nslabs << IO_TLB_SHIFT;
>>>
>>>   if (swiotlb_force == SWIOTLB_NO_FORCE)
>>>   return 0;
>>> @@ -297,20 +308,9 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long 
>>> nslabs)
>>>   if (!mem)
>>>   return -ENOMEM;
>>>
>>> - mem->nslabs = nslabs;
>>> - mem->start = virt_to_phys(tlb);
>>> - mem->end = mem->start + bytes;
>>> - mem->index = 0;
>>> - mem->late_alloc = 1;
>>> - spin_lock_init(&mem->lock);
>>> - for (i = 0; i < mem->nslabs; i++) {
>>> - mem->slots[i].list = IO_TLB_SEGSIZE - io_tlb_offset(i);
>>> - mem->slots[i].orig_addr = INVALID_PHYS_ADDR;
>>> - mem->slots[i].alloc_size = 0;
>>> - }
>>> -
>>> + memset(mem, 0, sizeof(*mem));
>>> + swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true);
>>>   set_memory_decrypted((unsigned long)tlb, bytes >> PAGE_SHIFT);
>>> - memset(tlb, 0, bytes);
>>
>> This is good for swiotlb_late_init_with_tbl. However I have just noticed
>> that mem could also be allocated from swiotlb_init_with_tbl, in which
>> case the zeroing is missing. I think we need another memset in
>> swiotlb_init_with_tbl as well. Or maybe it could be better to have a
>> single memset at the beginning of swiotlb_init_io_tlb_mem instead. Up to
>> you.
> 
> swiotlb_init_with_tbl uses memblock_alloc to allocate the io_tlb_mem
> and memblock_alloc[1] will do memset in memblock_alloc_try_nid[2], so
> swiotlb_init_with_tbl is also good.
> I'm happy to add the memset in swiotlb_init_io_tlb_mem if you think
> it's clearer and safer.

On x86, if the memset is done before set_memory_decrypted() and memory
encryption is active, then the memory will look like ciphertext afterwards
and not be zeroes. If zeroed memory is required, then a memset must be
done after the set_memory_decrypted() calls.

Thanks,
Tom

> 
> [1] 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Fv5.13-rc6%2Fsource%2Finclude%2Flinux%2Fmemblock.h%23L407&data=04

Re: [PATCH v2 5/9] powerpc/microwatt: Use standard 16550 UART for console

2021-06-18 Thread Paul Mackerras
On Fri, Jun 18, 2021 at 05:40:40PM +1000, Nicholas Piggin wrote:
> Excerpts from Paul Mackerras's message of June 18, 2021 1:46 pm:
> > From: Benjamin Herrenschmidt 
> > 
> > This adds support to the Microwatt platform to use the standard
> > 16550-style UART which available in the standalone Microwatt FPGA.
> > 
> > Signed-off-by: Benjamin Herrenschmidt 
> > Signed-off-by: Paul Mackerras 
...
> > +#ifdef CONFIG_PPC_EARLY_DEBUG_MICROWATT
> > +
> > +#define UDBG_UART_MW_ADDR  ((void __iomem *)0xc0002000)
> > +
> > +static u8 udbg_uart_in_isa300_rm(unsigned int reg)
> > +{
> > +   uint64_t msr = mfmsr();
> > +   uint8_t  c;
> > +
> > +   mtmsr(msr & ~(MSR_EE|MSR_DR));
> > +   isync();
> > +   eieio();
> > +   c = __raw_rm_readb(UDBG_UART_MW_ADDR + (reg << 2));
> > +   mtmsr(msr);
> > +   isync();
> > +   return c;
> > +}
> 
> Why is realmode required? No cache inhibited mappings yet?

Because it's EARLY debug, for use in the very early stages of boot
when the kernel's radix tree may or may not have been initialized.
The easiest way to make a function that works correctly whether or not
the radix tree has been initialized and the MMU turned on is to
temporarily turn off the MMU for data accesses and use lbzcix/stbcix
(which Microwatt has, even though it doesn't implement hypervisor
mode).

(I don't know which "yet" you meant - "yet" in the process of booting a
kernel, or "yet" in the process of Microwatt's development?  Microwatt
certainly does have cache-inhibited mappings and has done since the
MMU was first introduced.)

In fact the defconfig I add later in the series doesn't enable
CONFIG_PPC_EARLY_DEBUG_MICROWATT, but it's there if it's needed for
debugging.

> mtmsrd with L=0 is defined to be context synchronizing in isa 3, so I 
> don't think the isync would be required. There is a bit of code around 
> arch/powerpc that does this, maybe it used to be needed or some other
> implementations needed it?
> 
> That's just for my curiosity, it doesn't really hurt to have them
> there.

Right, and in fact mtmsrd is marked as a single-issue instruction in
Microwatt, so it should work with no isyncs or eieios.  Presumably Ben
copied the isync/eieio pattern from somewhere else.

Paul.


linux-next: manual merge of the akpm-current tree with the powerpc tree

2021-06-18 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the akpm-current tree got a conflict in:

  arch/powerpc/kernel/smp.c

between commit:

  86f46f343272 ("powerpc/32s: Initialise KUAP and KUEP in C")

from the powerpc tree and commit:

  103e676c91d0 ("lazy tlb: introduce lazy mm refcount helper functions")

from the akpm-current tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc arch/powerpc/kernel/smp.c
index b83a59ce9beb,b289f1d213f8..
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@@ -1541,11 -1541,7 +1541,11 @@@ void start_secondary(void *unused
  {
unsigned int cpu = raw_smp_processor_id();
  
 +  /* PPC64 calls setup_kup() in early_setup_secondary() */
 +  if (IS_ENABLED(CONFIG_PPC32))
 +  setup_kup();
 +
-   mmgrab(&init_mm);
+   mmgrab_lazy_tlb(&init_mm);
current->active_mm = &init_mm;
  
smp_store_cpu_info(cpu);


pgp3H9AgL1KIj.pgp
Description: OpenPGP digital signature


Re: [PATCH 1/1] powerpc/pseries: Interface to represent PAPR firmware attributes

2021-06-18 Thread Gautham R Shenoy
On Wed, Jun 16, 2021 at 07:12:40PM +0530, Pratik R. Sampat wrote:
> Adds a generic interface to represent the energy and frequency related
> PAPR attributes on the system using the new H_CALL
> "H_GET_ENERGY_SCALE_INFO".
> 
> H_GET_EM_PARMS H_CALL was previously responsible for exporting this
> information in the lparcfg, however the H_GET_EM_PARMS H_CALL
> will be deprecated P10 onwards.
> 
> The H_GET_ENERGY_SCALE_INFO H_CALL is of the following call format:
> hcall(
>   uint64 H_GET_ENERGY_SCALE_INFO,  // Get energy scale info
>   uint64 flags,   // Per the flag request
>   uint64 firstAttributeId,// The attribute id
>   uint64 bufferAddress,   // Guest physical address of the output buffer
>   uint64 bufferSize   // The size in bytes of the output buffer
> );
> 
> This H_CALL can query either all the attributes at once with
> firstAttributeId = 0, flags = 0 as well as query only one attribute
> at a time with firstAttributeId = id


For a single attribute, the “firstAttributeId” must be set by the
caller to the attribute id to retrieve and the “singleAttribute” field
in “flags” must also be set to a 1.

If we don't set the "flags" to 1, while specifying a firstAttributeId,
the hypervisor will populate the buffer with the details pertaining to
all the attributes beginning with firstAttributeId.



> 
> The output buffer consists of the following
> 1. number of attributes  - 8 bytes
> 2. array offset to the data location - 8 bytes
> 3. version info  - 1 byte
> 4. A data array of size num attributes, which contains the following:
>   a. attribute ID  - 8 bytes
>   b. attribute value in number - 8 bytes
>   c. attribute name in string  - 64 bytes
>   d. attribute value in string - 64 bytes
> 
[..snip..]

> +
> +static ssize_t papr_show_value(struct kobject *kobj,
> + struct kobj_attribute *attr,
> + char *buf)
> +{
> + struct papr_attr *pattr = container_of(attr, struct papr_attr, attr);
> + struct hv_energy_scale_buffer *t_buf;
> + struct energy_scale_attributes *t_ea;
> + int data_offset, ret = 0;
> +
> + t_buf = kmalloc(sizeof(*t_buf), GFP_KERNEL);
> + if (t_buf == NULL)
> + return -ENOMEM;
> +
> + ret = plpar_hcall_norets(H_GET_ENERGY_SCALE_INFO, 0,
> +  pattr->id, virt_to_phys(t_buf),
> +  sizeof(*t_buf));
> +


In this case, since we are interested in only one attribute, we can
make the call

ret = plpar_hcall_norets(H_GET_ENERGY_SCALE_INFO, 1,
 pattr->id, virt_to_phys(t_buf),
 sizeof(*t_buf));

setting flags=1.

Same in the function papr_show_value_desc() below

--
Thanks and Regards
gautham.


> + if (ret != H_SUCCESS) {
> + pr_warn("hcall failed: H_GET_ENERGY_SCALE_INFO");
> + goto out;
> + }
> +
> + data_offset = be64_to_cpu(t_buf->array_offset) -
> + (sizeof(t_buf->num_attr) +
> + sizeof(t_buf->array_offset) +
> + sizeof(t_buf->data_header_version));
> +
> + t_ea = (struct energy_scale_attributes *) &t_buf->data[data_offset];
> +
> + ret = sprintf(buf, "%llu\n", be64_to_cpu(t_ea->attr_value));
> + if (ret < 0)
> + ret = -EIO;
> +out:
> + kfree(t_buf);
> +
> + return ret;
> +}
> +
> +static ssize_t papr_show_value_desc(struct kobject *kobj,
> +  struct kobj_attribute *attr,
> +  char *buf)
> +{
> + struct papr_attr *pattr = container_of(attr, struct papr_attr, attr);
> + struct hv_energy_scale_buffer *t_buf;
> + struct energy_scale_attributes *t_ea;
> + int data_offset, ret = 0;
> +
> + t_buf = kmalloc(sizeof(*t_buf), GFP_KERNEL);
> + if (t_buf == NULL)
> + return -ENOMEM;
> +
> + ret = plpar_hcall_norets(H_GET_ENERGY_SCALE_INFO, 0,
> +  pattr->id, virt_to_phys(t_buf),
> +  sizeof(*t_buf));
> +
> + if (ret != H_SUCCESS) {
> + pr_warn("hcall failed: H_GET_ENERGY_SCALE_INFO");
> + goto out;
> + }
> +
> + data_offset = be64_to_cpu(t_buf->array_offset) -
> + (sizeof(t_buf->num_attr) +
> + sizeof(t_buf->array_offset) +
> + sizeof(t_buf->data_header_version));
> +
> + t_ea = (struct energy_scale_attributes *) &t_buf->data[data_offset];
> +
> + ret = sprintf(buf, "%s\n", t_ea->attr_value_desc);
> + if (ret < 0)
> + ret = -EIO;
> +out:
> + kfree(t_buf);
> +
> + return ret;
> +}
> +
> +static struct papr_ops_info {
> + const char *attr_name;
> + ssize_t (*show)(struct kobject *kobj, struct kobj_attribute *attr,
> + char *buf);
> +} ops_info[] = {
> + { "desc", papr_show_desc },
> + { "va

Re: [PATCH 4/4] powerpc: Enable KFENCE on BOOK3S/64

2021-06-18 Thread Daniel Axtens
Daniel Axtens  writes:

>> +#ifdef CONFIG_PPC64
>> +static inline bool kfence_protect_page(unsigned long addr, bool protect)
>> +{
>> +struct page *page;
>> +
>> +page = virt_to_page(addr);
>> +if (protect)
>> +__kernel_map_pages(page, 1, 0);
>> +else
>> +__kernel_map_pages(page, 1, 1);
>
> I lose track of the type conversions and code conventions involved, but
> can we do something like __kernel_map_pages(page, 1, !!protect)?

Ah, I missed that the if changed the truth/falsity. !protect, not
!!protect :P
>
> Apart from that, this seems good.
>
> Kind regards,
> Daniel


Re: [PATCH 4/4] powerpc: Enable KFENCE on BOOK3S/64

2021-06-18 Thread Daniel Axtens


> +#ifdef CONFIG_PPC64
> +static inline bool kfence_protect_page(unsigned long addr, bool protect)
> +{
> + struct page *page;
> +
> + page = virt_to_page(addr);
> + if (protect)
> + __kernel_map_pages(page, 1, 0);
> + else
> + __kernel_map_pages(page, 1, 1);

I lose track of the type conversions and code conventions involved, but
can we do something like __kernel_map_pages(page, 1, !!protect)?

Apart from that, this seems good.

Kind regards,
Daniel


Re: [PATCH v6 12/17] powerpc/pseries/vas: Integrate API with open/close windows

2021-06-18 Thread Haren Myneni
On Fri, 2021-06-18 at 09:22 +1000, Nicholas Piggin wrote:
> Excerpts from Haren Myneni's message of June 18, 2021 6:36 am:
> > This patch adds VAS window allocatioa/close with the corresponding
> > hcalls. Also changes to integrate with the existing user space VAS
> > API and provide register/unregister functions to NX pseries driver.
> > 
> > The driver register function is used to create the user space
> > interface (/dev/crypto/nx-gzip) and unregister to remove this
> > entry.
> > 
> > The user space process opens this device node and makes an ioctl
> > to allocate VAS window. The close interface is used to deallocate
> > window.
> > 
> > Signed-off-by: Haren Myneni 
> 
> Reviewed-by: Nicholas Piggin 
> 
> Unless there is some significant performance reason it might be
> simplest
> to take the mutex for the duration of the allocate and frees rather
> than 
> taking it several times, covering the atomic with the lock instead.
> 
> You have a big lock, might as well use it and not have to wonder what
> if 
> things race here or there.

Using mutex to protect allocate/deallocate window and setup/free IRQ,
also to protect updating the list. We do not need lock for modify
window hcall and other things. Hence taking mutex several times. Also
used atomic for counters (used_lpar_creds) which can be exported in
sysfs (this patch will be added later in next enhancement seris). 

Genarlly applications open window initially, do continuous copy/paste
operations and close window later. But possible that the library /
application to open/close window for each request. Also may be opening
or closing multiple windows (say 1000 depends on cores on the system)
at the same time. These cases may affect the application performance.

Thanks
Haren

> 
> But don't rework that now, maybe just something to consider for
> later.
> 
> Thanks,
> Nick
> 



Re: [PATCH 2/4] powerpc/64s: Remove unneeded #ifdef CONFIG_DEBUG_PAGEALLOC in hash_utils

2021-06-18 Thread Daniel Axtens
Hi Jordan and Christophe,

> --- a/arch/powerpc/mm/book3s64/hash_utils.c
> +++ b/arch/powerpc/mm/book3s64/hash_utils.c
> @@ -126,11 +126,8 @@ EXPORT_SYMBOL_GPL(mmu_slb_size);
>  #ifdef CONFIG_PPC_64K_PAGES
>  int mmu_ci_restrictions;
>  #endif
> -#ifdef CONFIG_DEBUG_PAGEALLOC
>  static u8 *linear_map_hash_slots;
>  static unsigned long linear_map_hash_count;
> -static DEFINE_SPINLOCK(linear_map_hash_lock);
> -#endif /* CONFIG_DEBUG_PAGEALLOC */
>  struct mmu_hash_ops mmu_hash_ops;
>  EXPORT_SYMBOL(mmu_hash_ops);
>  

> @@ -1944,6 +1937,8 @@ long hpte_insert_repeating(unsigned long hash, unsigned 
> long vpn,
>  }
>  
>  #ifdef CONFIG_DEBUG_PAGEALLOC
> +static DEFINE_SPINLOCK(linear_map_hash_lock);
> +

I had some trouble figuring out why the spinlock has to be in the
ifdef. A bit of investigation suggests that it's only used in functions
that are only defined under CONFIG_DEBUG_PAGEALLOC - unlike the other
variables. So that makes sense.

While I was poking around, I noticed that linear_map_hash_slots is
manipulated under linear_map_hash_lock in kernel_(un)map_linear_page but
is manipulated outside the lock in htab_bolt_mapping(). Is that OK? (I
don't know when htab_bolt_mapping is called, it's possible it's only
called at times where nothing else could be happing to that array.)

Kind regards,
Daniel

>  static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi)
>  {
>   unsigned long hash;
> -- 
> 2.25.1


Re: [PATCH v2 5/9] powerpc/microwatt: Use standard 16550 UART for console

2021-06-18 Thread Nicholas Piggin
Excerpts from Paul Mackerras's message of June 18, 2021 1:46 pm:
> From: Benjamin Herrenschmidt 
> 
> This adds support to the Microwatt platform to use the standard
> 16550-style UART which available in the standalone Microwatt FPGA.
> 
> Signed-off-by: Benjamin Herrenschmidt 
> Signed-off-by: Paul Mackerras 
> ---
>  arch/powerpc/boot/dts/microwatt.dts  | 27 
>  arch/powerpc/kernel/udbg_16550.c | 39 
>  arch/powerpc/platforms/microwatt/Kconfig |  1 +
>  arch/powerpc/platforms/microwatt/setup.c |  2 ++
>  4 files changed, 63 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/powerpc/boot/dts/microwatt.dts 
> b/arch/powerpc/boot/dts/microwatt.dts
> index 04e5dd92270e..974abbdda249 100644
> --- a/arch/powerpc/boot/dts/microwatt.dts
> +++ b/arch/powerpc/boot/dts/microwatt.dts
> @@ -6,6 +6,10 @@ / {
>   model-name = "microwatt";
>   compatible = "microwatt-soc";
>  
> + aliases {
> + serial0 = &UART0;
> + };
> +
>   reserved-memory {
>   #size-cells = <0x02>;
>   #address-cells = <0x02>;
> @@ -89,12 +93,6 @@ PowerPC,Microwatt@0 {
>   };
>   };
>  
> - chosen {
> - bootargs = "";
> - ibm,architecture-vec-5 = [19 00 10 00 00 00 00 00 00 00 00 00 
> 00 00 00 00
> -   00 00 00 00 00 00 00 00 40 00 40];
> - };
> -
>   soc@c000 {
>   compatible = "simple-bus";
>   #address-cells = <1>;
> @@ -119,5 +117,22 @@ ICS: interrupt-controller@5000 {
>   #interrupt-cells = <2>;
>   };
>  
> + UART0: serial@2000 {
> + device_type = "serial";
> + compatible = "ns16550";
> + reg = <0x2000 0x8>;
> + clock-frequency = <1>;
> + current-speed = <115200>;
> + reg-shift = <2>;
> + fifo-size = <16>;
> + interrupts = <0x10 0x1>;
> + };
> + };
> +
> + chosen {
> + bootargs = "";
> + ibm,architecture-vec-5 = [19 00 10 00 00 00 00 00 00 00 00 00 
> 00 00 00 00
> +   00 00 00 00 00 00 00 00 40 00 40];
> + stdout-path = &UART0;
>   };
>  };
> diff --git a/arch/powerpc/kernel/udbg_16550.c 
> b/arch/powerpc/kernel/udbg_16550.c
> index 9356b60d6030..8513aa49614e 100644
> --- a/arch/powerpc/kernel/udbg_16550.c
> +++ b/arch/powerpc/kernel/udbg_16550.c
> @@ -296,3 +296,42 @@ void __init udbg_init_40x_realmode(void)
>  }
>  
>  #endif /* CONFIG_PPC_EARLY_DEBUG_40x */
> +
> +#ifdef CONFIG_PPC_EARLY_DEBUG_MICROWATT
> +
> +#define UDBG_UART_MW_ADDR((void __iomem *)0xc0002000)
> +
> +static u8 udbg_uart_in_isa300_rm(unsigned int reg)
> +{
> + uint64_t msr = mfmsr();
> + uint8_t  c;
> +
> + mtmsr(msr & ~(MSR_EE|MSR_DR));
> + isync();
> + eieio();
> + c = __raw_rm_readb(UDBG_UART_MW_ADDR + (reg << 2));
> + mtmsr(msr);
> + isync();
> + return c;
> +}

Why is realmode required? No cache inhibited mappings yet?

mtmsrd with L=0 is defined to be context synchronizing in isa 3, so I 
don't think the isync would be required. There is a bit of code around 
arch/powerpc that does this, maybe it used to be needed or some other
implementations needed it?

That's just for my curiosity, it doesn't really hurt to have them
there.

Thanks,
Nick


Re: [PATCH 1/4] powerpc/64s: Add DEBUG_PAGEALLOC for radix

2021-06-18 Thread Daniel Axtens
Jordan Niethe  writes:

> There is support for DEBUG_PAGEALLOC on hash but not on radix.
> Add support on radix.

Somewhat off-topic but I wonder at what point we can drop the weird !PPC
condition in mm/Kconfig.debug:

config DEBUG_PAGEALLOC
bool "Debug page memory allocations"
depends on DEBUG_KERNEL
depends on !HIBERNATION || ARCH_SUPPORTS_DEBUG_PAGEALLOC && !PPC && 
!SPARC

I can't figure out from git history why it exists or if it still serves
any function. Given that HIBERNATION isn't much use on Book3S systems it
probably never matters, it just bugs me a bit.

Again, nothing that has to block this series, just maybe something to
follow up at some vague and undefined point in the future!

> diff --git a/arch/powerpc/include/asm/book3s/64/pgtable.h 
> b/arch/powerpc/include/asm/book3s/64/pgtable.h
> index a666d561b44d..b89482aed82a 100644
> --- a/arch/powerpc/include/asm/book3s/64/pgtable.h
> +++ b/arch/powerpc/include/asm/book3s/64/pgtable.h
> @@ -812,6 +822,15 @@ static inline bool check_pte_access(unsigned long 
> access, unsigned long ptev)
>   * Generic functions with hash/radix callbacks
>   */
>  
> +#ifdef CONFIG_DEBUG_PAGEALLOC
> +static inline void __kernel_map_pages(struct page *page, int numpages, int 
> enable)
> +{
> + if (radix_enabled())
> + radix__kernel_map_pages(page, numpages, enable);
> + hash__kernel_map_pages(page, numpages, enable);


Does this require an else? On radix we will call both radix__ and
hash__kernel_map_pages.

I notice we call both hash__ and radix__ in map_kernel_page under radix,
so maybe this makes sense?

I don't fully understand the mechanism by which memory removal works: it
looks like on radix you mark the page as 'absent', which I think is
enough? Then you fall through to the hash code here:

for (i = 0; i < numpages; i++, page++) {
vaddr = (unsigned long)page_address(page);
lmi = __pa(vaddr) >> PAGE_SHIFT;
if (lmi >= linear_map_hash_count)
continue;

I think linear_map_hash_count will be zero unless it gets inited to a
non-zero value in htab_initialize(). I am fairly sure htab_initialize()
doesn't get called for a radix MMU. In that case you'll just `continue;`
out of every iteration of the loop, which would explain why nothing
weird would happen on radix.

Have I missed something here?

The rest of the patch looks good to me.

Kind regards,
Daniel


Re: [PATCH] Documentation: PCI: pci-error-recovery: rearrange the general sequence

2021-06-18 Thread Oliver O'Halloran
On Fri, Jun 18, 2021 at 4:05 PM Wesley Sheng  wrote:
>
> Reset_link() callback function was called before mmio_enabled() in
> pcie_do_recovery() function actually, so rearrange the general
> sequence betwen step 2 and step 3 accordingly.

I don't think this is true in all cases. If pcie_do_recovery() is
called with state==pci_channel_io_normal (i.e. non-fatal AER) the link
won't be reset. EEH (ppc PCI error recovery thing) also uses
.mmio_enabled() as described.