Re: [PATCH] powerpc/eeh_cache: Fix a possible debugfs deadlock

2020-10-28 Thread Oliver O'Halloran
On Thu, Oct 29, 2020 at 2:27 AM Qian Cai  wrote:
>
> Lockdep complains that a possible deadlock below in
> eeh_addr_cache_show() because it is acquiring a lock with IRQ enabled,
> but eeh_addr_cache_insert_dev() needs to acquire the same lock with IRQ
> disabled. Let's just make eeh_addr_cache_show() acquire the lock with
> IRQ disabled as well.
>
> CPU0CPU1
> 
>lock(_io_addr_cache_root.piar_lock);
> local_irq_disable();
> lock(>lock);
> lock(_io_addr_cache_root.piar_lock);
>
>  lock(>lock);
>
>   *** DEADLOCK ***
>
>   lock_acquire+0x140/0x5f0
>   _raw_spin_lock_irqsave+0x64/0xb0
>   eeh_addr_cache_insert_dev+0x48/0x390
>   eeh_probe_device+0xb8/0x1a0
>   pnv_pcibios_bus_add_device+0x3c/0x80
>   pcibios_bus_add_device+0x118/0x290
>   pci_bus_add_device+0x28/0xe0
>   pci_bus_add_devices+0x54/0xb0
>   pcibios_init+0xc4/0x124
>   do_one_initcall+0xac/0x528
>   kernel_init_freeable+0x35c/0x3fc
>   kernel_init+0x24/0x148
>   ret_from_kernel_thread+0x5c/0x80
>
>   lock_acquire+0x140/0x5f0
>   _raw_spin_lock+0x4c/0x70
>   eeh_addr_cache_show+0x38/0x110
>   seq_read+0x1a0/0x660
>   vfs_read+0xc8/0x1f0
>   ksys_read+0x74/0x130
>   system_call_exception+0xf8/0x1d0
>   system_call_common+0xe8/0x218
>
> Fixes: 5ca85ae6318d ("powerpc/eeh_cache: Add a way to dump the EEH address 
> cache")
> Signed-off-by: Qian Cai 

Good catch,

Reviewed-by: Oliver O'Halloran 


Re: [RFC PATCH kernel 1/2] irq: Add reference counting to IRQ mappings

2020-10-28 Thread Alexey Kardashevskiy




On 28/10/2020 03:09, Marc Zyngier wrote:

Hi Alexey,

On 2020-10-27 09:06, Alexey Kardashevskiy wrote:

PCI devices share 4 legacy INTx interrupts from the same PCI host bridge.
Device drivers map/unmap hardware interrupts via irq_create_mapping()/
irq_dispose_mapping(). The problem with that these interrupts are
shared and when performing hot unplug, we need to unmap the interrupt
only when the last device is released.

This reuses already existing irq_desc::kobj for this purpose.
The refcounter is naturally 1 when the descriptor is allocated already;
this adds kobject_get() in places where already existing mapped virq
is returned.


That's quite interesting, as I was about to revive a patch series that
rework the irqdomain subsystem to directly cache irq_desc instead of
raw interrupt numbers. And for that, I needed some form of refcounting...



This reorganizes irq_dispose_mapping() to release the kobj and let
the release callback do the cleanup.

If some driver or platform does its own reference counting, this expects
those parties to call irq_find_mapping() and call irq_dispose_mapping()
for every irq_create_fwspec_mapping()/irq_create_mapping().

Signed-off-by: Alexey Kardashevskiy 
---
 kernel/irq/irqdesc.c   | 35 +++
 kernel/irq/irqdomain.c | 27 +--
 2 files changed, 36 insertions(+), 26 deletions(-)

diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c
index 1a7723604399..dae096238500 100644
--- a/kernel/irq/irqdesc.c
+++ b/kernel/irq/irqdesc.c
@@ -419,20 +419,39 @@ static struct irq_desc *alloc_desc(int irq, int
node, unsigned int flags,
 return NULL;
 }

+static void delayed_free_desc(struct rcu_head *rhp);
 static void irq_kobj_release(struct kobject *kobj)
 {
 struct irq_desc *desc = container_of(kobj, struct irq_desc, kobj);
+    struct irq_domain *domain;
+    unsigned int virq = desc->irq_data.irq;

-    free_masks(desc);
-    free_percpu(desc->kstat_irqs);
-    kfree(desc);
+    domain = desc->irq_data.domain;
+    if (domain) {
+    if (irq_domain_is_hierarchy(domain)) {
+    irq_domain_free_irqs(virq, 1);


How does this work with hierarchical domains? Each domain should
contribute as a reference on the irq_desc. But if you got here,
it means the refcount has already dropped to 0.

So either there is nothing to free here, or you don't track the
references implied by the hierarchy. I suspect the latter.


This is correct, I did not look at hierarchy yet, looking now...




+    } else {
+    irq_domain_disassociate(domain, virq);
+    irq_free_desc(virq);
+    }
+    }
+
+    /*
+ * We free the descriptor, masks and stat fields via RCU. That
+ * allows demultiplex interrupts to do rcu based management of
+ * the child interrupts.
+ * This also allows us to use rcu in kstat_irqs_usr().
+ */
+    call_rcu(>rcu, delayed_free_desc);
 }

 static void delayed_free_desc(struct rcu_head *rhp)
 {
 struct irq_desc *desc = container_of(rhp, struct irq_desc, rcu);

-    kobject_put(>kobj);
+    free_masks(desc);
+    free_percpu(desc->kstat_irqs);
+    kfree(desc);
 }

 static void free_desc(unsigned int irq)
@@ -453,14 +472,6 @@ static void free_desc(unsigned int irq)
  */
 irq_sysfs_del(desc);
 delete_irq_desc(irq);
-
-    /*
- * We free the descriptor, masks and stat fields via RCU. That
- * allows demultiplex interrupts to do rcu based management of
- * the child interrupts.
- * This also allows us to use rcu in kstat_irqs_usr().
- */
-    call_rcu(>rcu, delayed_free_desc);
 }

 static int alloc_descs(unsigned int start, unsigned int cnt, int node,
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index cf8b374b892d..02733ddc321f 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -638,6 +638,7 @@ unsigned int irq_create_mapping(struct irq_domain 
*domain,

 {
 struct device_node *of_node;
 int virq;
+    struct irq_desc *desc;

 pr_debug("irq_create_mapping(0x%p, 0x%lx)\n", domain, hwirq);

@@ -655,7 +656,9 @@ unsigned int irq_create_mapping(struct irq_domain 
*domain,

 /* Check if mapping already exists */
 virq = irq_find_mapping(domain, hwirq);
 if (virq) {
+    desc = irq_to_desc(virq);
 pr_debug("-> existing mapping on virq %d\n", virq);
+    kobject_get(>kobj);


My worry with this is that there is probably a significant amount of
code out there that relies on multiple calls to irq_create_mapping()
with the same parameters not to have any side effects. They would
expect a subsequent irq_dispose_mapping() to drop the translation
altogether, and that's obviously not the case here.

Have you audited the various call sites to see what could break?



The vast majority calls one of irq..create_mapping in init/probe and 
then calls irq_dispose_mapping() right there if probing failed or when 
the driver is unloaded. I could not spot any reference counting 
anywhere, 

[PATCH kernel v4 1/2] dma: Allow mixing bypass and mapped DMA operation

2020-10-28 Thread Alexey Kardashevskiy
At the moment we allow bypassing DMA ops only when we can do this for
the entire RAM. However there are configs with mixed type memory
where we could still allow bypassing IOMMU in most cases;
POWERPC with persistent memory is one example.

This adds an arch hook to determine where bypass can still work and
we invoke direct DMA API. The following patch checks the bus limit
on POWERPC to allow or disallow direct mapping.

This adds a CONFIG_ARCH_HAS_DMA_SET_MASK config option to make arch_
hooks no-op by default.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v4:
* wrapped long lines
---
 kernel/dma/mapping.c | 26 ++
 kernel/dma/Kconfig   |  4 
 2 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 51bb8fa8eb89..ad1f052e046d 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -137,6 +137,20 @@ static inline bool dma_map_direct(struct device *dev,
return dma_go_direct(dev, *dev->dma_mask, ops);
 }
 
+#ifdef CONFIG_ARCH_HAS_DMA_MAP_DIRECT
+bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr);
+bool arch_dma_unmap_page_direct(struct device *dev, dma_addr_t dma_handle);
+bool arch_dma_map_sg_direct(struct device *dev, struct scatterlist *sg,
+   int nents);
+bool arch_dma_unmap_sg_direct(struct device *dev, struct scatterlist *sg,
+ int nents);
+#else
+#define arch_dma_map_page_direct(d, a) (0)
+#define arch_dma_unmap_page_direct(d, a) (0)
+#define arch_dma_map_sg_direct(d, s, n) (0)
+#define arch_dma_unmap_sg_direct(d, s, n) (0)
+#endif
+
 dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
size_t offset, size_t size, enum dma_data_direction dir,
unsigned long attrs)
@@ -149,7 +163,8 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct 
page *page,
if (WARN_ON_ONCE(!dev->dma_mask))
return DMA_MAPPING_ERROR;
 
-   if (dma_map_direct(dev, ops))
+   if (dma_map_direct(dev, ops) ||
+   arch_dma_map_page_direct(dev, page_to_phys(page) + offset + size))
addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
else
addr = ops->map_page(dev, page, offset, size, dir, attrs);
@@ -165,7 +180,8 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t 
addr, size_t size,
const struct dma_map_ops *ops = get_dma_ops(dev);
 
BUG_ON(!valid_dma_direction(dir));
-   if (dma_map_direct(dev, ops))
+   if (dma_map_direct(dev, ops) ||
+   arch_dma_unmap_page_direct(dev, addr + size))
dma_direct_unmap_page(dev, addr, size, dir, attrs);
else if (ops->unmap_page)
ops->unmap_page(dev, addr, size, dir, attrs);
@@ -188,7 +204,8 @@ int dma_map_sg_attrs(struct device *dev, struct scatterlist 
*sg, int nents,
if (WARN_ON_ONCE(!dev->dma_mask))
return 0;
 
-   if (dma_map_direct(dev, ops))
+   if (dma_map_direct(dev, ops) ||
+   arch_dma_map_sg_direct(dev, sg, nents))
ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
else
ents = ops->map_sg(dev, sg, nents, dir, attrs);
@@ -207,7 +224,8 @@ void dma_unmap_sg_attrs(struct device *dev, struct 
scatterlist *sg,
 
BUG_ON(!valid_dma_direction(dir));
debug_dma_unmap_sg(dev, sg, nents, dir);
-   if (dma_map_direct(dev, ops))
+   if (dma_map_direct(dev, ops) ||
+   arch_dma_unmap_sg_direct(dev, sg, nents))
dma_direct_unmap_sg(dev, sg, nents, dir, attrs);
else if (ops->unmap_sg)
ops->unmap_sg(dev, sg, nents, dir, attrs);
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index c99de4a21458..43d106598e82 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -20,6 +20,10 @@ config DMA_OPS
 config DMA_OPS_BYPASS
bool
 
+# Lets platform IOMMU driver choose between bypass and IOMMU
+config ARCH_HAS_DMA_MAP_DIRECT
+   bool
+
 config NEED_SG_DMA_LENGTH
bool
 
-- 
2.17.1



[PATCH kernel v4 2/2] powerpc/dma: Fallback to dma_ops when persistent memory present

2020-10-28 Thread Alexey Kardashevskiy
So far we have been using huge DMA windows to map all the RAM available.
The RAM is normally mapped to the VM address space contiguously, and
there is always a reasonable upper limit for possible future hot plugged
RAM which makes it easy to map all RAM via IOMMU.

Now there is persistent memory ("ibm,pmemory" in the FDT) which (unlike
normal RAM) can map anywhere in the VM space beyond the maximum RAM size
and since it can be used for DMA, it requires extending the huge window
up to MAX_PHYSMEM_BITS which requires hypervisor support for:
1. huge TCE tables;
2. multilevel TCE tables;
3. huge IOMMU pages.

Certain hypervisors cannot do either so the only option left is
restricting the huge DMA window to include only RAM and fallback to
the default DMA window for persistent memory.

This defines arch_dma_map_direct/etc to allow generic DMA code perform
additional checks on whether direct DMA is still possible.

This checks if the system has persistent memory. If it does not,
the DMA bypass mode is selected, i.e.
* dev->bus_dma_limit = 0
* dev->dma_ops_bypass = true <- this avoid calling dma_ops for mapping.

If there is such memory, this creates identity mapping only for RAM and
sets the dev->bus_dma_limit to let the generic code decide whether to
call into the direct DMA or the indirect DMA ops.

This should not change the existing behaviour when no persistent memory
as dev->dma_ops_bypass is expected to be set.

Signed-off-by: Alexey Kardashevskiy 
---
Changes:
v4:
* fixed leaked device_node
* wrapped long lines
---
 arch/powerpc/kernel/dma-iommu.c| 73 +-
 arch/powerpc/platforms/pseries/iommu.c | 51 ++
 arch/powerpc/Kconfig   |  1 +
 3 files changed, 113 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index a1c744194018..e5e9e5e3e3ca 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -10,6 +10,65 @@
 #include 
 #include 
 
+#define can_map_direct(dev, addr) \
+   ((dev)->bus_dma_limit >= phys_to_dma((dev), (addr)))
+
+bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr)
+{
+   if (likely(!dev->bus_dma_limit))
+   return false;
+
+   return can_map_direct(dev, addr);
+}
+EXPORT_SYMBOL_GPL(arch_dma_map_page_direct);
+
+#define is_direct_handle(dev, h) ((h) >= (dev)->archdata.dma_offset)
+
+bool arch_dma_unmap_page_direct(struct device *dev, dma_addr_t dma_handle)
+{
+   if (likely(!dev->bus_dma_limit))
+   return false;
+
+   return is_direct_handle(dev, dma_handle);
+}
+EXPORT_SYMBOL_GPL(arch_dma_unmap_page_direct);
+
+bool arch_dma_map_sg_direct(struct device *dev, struct scatterlist *sg,
+   int nents)
+{
+   struct scatterlist *s;
+   int i;
+
+   if (likely(!dev->bus_dma_limit))
+   return false;
+
+   for_each_sg(sg, s, nents, i) {
+   if (!can_map_direct(dev, sg_phys(s) + s->offset + s->length))
+   return false;
+   }
+
+   return true;
+}
+EXPORT_SYMBOL(arch_dma_map_sg_direct);
+
+bool arch_dma_unmap_sg_direct(struct device *dev, struct scatterlist *sg,
+ int nents)
+{
+   struct scatterlist *s;
+   int i;
+
+   if (likely(!dev->bus_dma_limit))
+   return false;
+
+   for_each_sg(sg, s, nents, i) {
+   if (!is_direct_handle(dev, s->dma_address + s->length))
+   return false;
+   }
+
+   return true;
+}
+EXPORT_SYMBOL(arch_dma_unmap_sg_direct);
+
 /*
  * Generic iommu implementation
  */
@@ -90,8 +149,18 @@ int dma_iommu_dma_supported(struct device *dev, u64 mask)
struct iommu_table *tbl = get_iommu_table_base(dev);
 
if (dev_is_pci(dev) && dma_iommu_bypass_supported(dev, mask)) {
-   dev->dma_ops_bypass = true;
-   dev_dbg(dev, "iommu: 64-bit OK, using fixed ops\n");
+   /*
+* dma_iommu_bypass_supported() sets dma_max when there is
+* 1:1 mapping but it is somehow limited.
+* ibm,pmemory is one example.
+*/
+   dev->dma_ops_bypass = dev->bus_dma_limit == 0;
+   if (!dev->dma_ops_bypass)
+   dev_warn(dev,
+"iommu: 64-bit OK but direct DMA is limited by 
%llx\n",
+dev->bus_dma_limit);
+   else
+   dev_dbg(dev, "iommu: 64-bit OK, using fixed ops\n");
return 1;
}
 
diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index e4198700ed1a..9fc5217f0c8e 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -839,7 +839,7 @@ static void remove_ddw(struct device_node *np, bool 
remove_prop)
np, ret);
 }
 
-static u64 

[PATCH kernel v4 0/2] DMA, powerpc/dma: Fallback to dma_ops when persistent memory present

2020-10-28 Thread Alexey Kardashevskiy


This allows mixing direct DMA (to/from RAM) and
IOMMU (to/from apersistent memory) on the PPC64/pseries
platform.

This replaces https://lkml.org/lkml/2020/10/28/929
which replaces https://lkml.org/lkml/2020/10/27/418
which replaces https://lkml.org/lkml/2020/10/20/1085


This is based on sha1
4525c8781ec0 Linus Torvalds "scsi: qla2xxx: remove incorrect sparse #ifdef".

Please comment. Thanks.



Alexey Kardashevskiy (2):
  dma: Allow mixing bypass and mapped DMA operation
  powerpc/dma: Fallback to dma_ops when persistent memory present

 arch/powerpc/kernel/dma-iommu.c| 73 +-
 arch/powerpc/platforms/pseries/iommu.c | 51 ++
 kernel/dma/mapping.c   | 26 +++--
 arch/powerpc/Kconfig   |  1 +
 kernel/dma/Kconfig |  4 ++
 5 files changed, 139 insertions(+), 16 deletions(-)

-- 
2.17.1



Re: [PATCH] ibmvscsi: fix race potential race after loss of transport

2020-10-28 Thread Tyrel Datwyler
On 10/27/20 10:21 PM, Michael Ellerman wrote:
> Tyrel Datwyler  writes:
>> After a loss of tranport due to an adatper migration or crash/disconnect from
>> the host partner there is a tiny window where we can race adjusting the
>> request_limit of the adapter. The request limit is atomically inc/dec to 
>> track
>> the number of inflight requests against the allowed limit of our VIOS 
>> partner.
>> After a transport loss we set the request_limit to zero to reflect this 
>> state.
>> However, there is a window where the adapter may attempt to queue a command
>> because the transport loss event hasn't been fully processed yet and
>> request_limit is still greater than zero. The hypercall to send the event 
>> will
>> fail and the error path will increment the request_limit as a result. If the
>> adapter processes the transport event prior to this increment the 
>> request_limit
>> becomes out of sync with the adapter state and can result in scsi commands 
>> being
>> submitted on the now reset connection prior to an SRP Login resulting in a
>> protocol violation.
>>
>> Fix this race by protecting request_limit with the host lock when changing 
>> the
>> value via atomic_set() to indicate no transport.
>>
>> Signed-off-by: Tyrel Datwyler 
>> ---
>>  drivers/scsi/ibmvscsi/ibmvscsi.c | 36 +++-
>>  1 file changed, 26 insertions(+), 10 deletions(-)
>>
>> diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c 
>> b/drivers/scsi/ibmvscsi/ibmvscsi.c
>> index b1f3017b6547..188ed75417a5 100644
>> --- a/drivers/scsi/ibmvscsi/ibmvscsi.c
>> +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c
>> @@ -806,6 +806,22 @@ static void purge_requests(struct ibmvscsi_host_data 
>> *hostdata, int error_code)
>>  spin_unlock_irqrestore(hostdata->host->host_lock, flags);
>>  }
>>  
>> +/**
>> + * ibmvscsi_set_request_limit - Set the adapter request_limit in response to
>> + * an adapter failure, reset, or SRP Login. Done under host lock to prevent
>> + * race with scsi command submission.
>> + * @hostdata:   adapter to adjust
>> + * @limit:  new request limit
>> + */
>> +static void ibmvscsi_set_request_limit(struct ibmvscsi_host_data *hostdata, 
>> int limit)
>> +{
>> +unsigned long flags;
>> +
>> +spin_lock_irqsave(hostdata->host->host_lock, flags);
>> +atomic_set(>request_limit, limit);
>> +spin_unlock_irqrestore(hostdata->host->host_lock, flags);
>> +}
>> +
>>  /**
>>   * ibmvscsi_reset_host - Reset the connection to the server
>>   * @hostdata:   struct ibmvscsi_host_data to reset
> ...
>> @@ -2137,12 +2153,12 @@ static void ibmvscsi_do_work(struct 
>> ibmvscsi_host_data *hostdata)
>>  }
>>  
>>  hostdata->action = IBMVSCSI_HOST_ACTION_NONE;
>> +spin_unlock_irqrestore(hostdata->host->host_lock, flags);
> 
> You drop the lock ...
> 
>>  if (rc) {
>> -atomic_set(>request_limit, -1);
>> +ibmvscsi_set_request_limit(hostdata, -1);
> 
> .. then retake it, then drop it again in ibmvscsi_set_request_limit().
> 
> Which introduces the possibility that something else gets the lock
> before you can set the limit to -1.
> 
> I'm not sure that's a bug, but it's not obviously correct either?

Yeah, I'd already moved the request_limit update into its own function before I
got to this case which made me a bit uneasy when I realized I had to drop the
lock because my new function takes the lock. However, we only need to protect
ourselves from from racing with queuecommand() which is locked for its entire
call. Further, if we've gotten here it means we were either resetting or
re-enabling the adapter which would have already set request_limit to zero. At
this point the transport was already gone and we've further failed to reset it.
Also, we've blocked any new scsi requests at this point.

-Tyrel

> 
> cheers
> 
>>  dev_err(hostdata->dev, "error after %s\n", action);
>>  }
>> -spin_unlock_irqrestore(hostdata->host->host_lock, flags);
>>  
>>  scsi_unblock_requests(hostdata->host);
>>  }



Re: [PATCH kernel v3 2/2] powerpc/dma: Fallback to dma_ops when persistent memory present

2020-10-28 Thread Alexey Kardashevskiy




On 29/10/2020 11:40, Michael Ellerman wrote:

Alexey Kardashevskiy  writes:

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index e4198700ed1a..91112e748491 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -,11 +1112,13 @@ static void reset_dma_window(struct pci_dev *dev, 
struct device_node *par_dn)
   */
  static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
  {
-   int len, ret;
+   int len = 0, ret;
+   bool pmem_present = of_find_node_by_type(NULL, "ibm,pmemory") != NULL;


That leaks a reference on the returned node.

dn = of_find_node_by_type(NULL, "ibm,pmemory");
pmem_present = dn != NULL;
of_node_put(dn);



ah, true. v4 then.






@@ -1126,7 +1129,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct 
device_node *pdn)
  
  	mutex_lock(_window_init_mutex);
  
-	dma_addr = find_existing_ddw(pdn);

+   dma_addr = find_existing_ddw(pdn, );


I don't see len used anywhere?


if (dma_addr != 0)
goto out_unlock;
  
@@ -1212,14 +1215,26 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)

}
/* verify the window * number of ptes will map the partition */
/* check largest block * page size > max memory hotplug addr */
-   max_addr = ddw_memory_hotplug_max();
-   if (query.largest_available_block < (max_addr >> page_shift)) {
-   dev_dbg(>dev, "can't map partition max 0x%llx with %llu "
- "%llu-sized pages\n", max_addr,  
query.largest_available_block,
- 1ULL << page_shift);
+   /*
+* The "ibm,pmemory" can appear anywhere in the address space.
+* Assuming it is still backed by page structs, try MAX_PHYSMEM_BITS
+* for the upper limit and fallback to max RAM otherwise but this
+* disables device::dma_ops_bypass.
+*/
+   len = max_ram_len;


Here you override whatever find_existing_ddw() wrote to len?



Not always, there is a bunch of gotos before this line to the end of the 
function and one (which returns the existing window) is legit. Thanks,







+   if (pmem_present) {
+   if (query.largest_available_block >=
+   (1ULL << (MAX_PHYSMEM_BITS - page_shift)))
+   len = MAX_PHYSMEM_BITS - page_shift;
+   else
+   dev_info(>dev, "Skipping ibm,pmemory");
+   }
+
+   if (query.largest_available_block < (1ULL << (len - page_shift))) {
+   dev_dbg(>dev, "can't map partition max 0x%llx with %llu 
%llu-sized pages\n",
+   1ULL << len, query.largest_available_block, 1ULL << 
page_shift);
goto out_failed;
}
-   len = order_base_2(max_addr);
win64 = kzalloc(sizeof(struct property), GFP_KERNEL);
if (!win64) {
dev_info(>dev,



cheers



--
Alexey


Re: [PATCH kernel v3 2/2] powerpc/dma: Fallback to dma_ops when persistent memory present

2020-10-28 Thread Michael Ellerman
Alexey Kardashevskiy  writes:
> diff --git a/arch/powerpc/platforms/pseries/iommu.c 
> b/arch/powerpc/platforms/pseries/iommu.c
> index e4198700ed1a..91112e748491 100644
> --- a/arch/powerpc/platforms/pseries/iommu.c
> +++ b/arch/powerpc/platforms/pseries/iommu.c
> @@ -,11 +1112,13 @@ static void reset_dma_window(struct pci_dev *dev, 
> struct device_node *par_dn)
>   */
>  static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>  {
> - int len, ret;
> + int len = 0, ret;
> + bool pmem_present = of_find_node_by_type(NULL, "ibm,pmemory") != NULL;

That leaks a reference on the returned node.

dn = of_find_node_by_type(NULL, "ibm,pmemory");
pmem_present = dn != NULL;
of_node_put(dn);


> @@ -1126,7 +1129,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct 
> device_node *pdn)
>  
>   mutex_lock(_window_init_mutex);
>  
> - dma_addr = find_existing_ddw(pdn);
> + dma_addr = find_existing_ddw(pdn, );

I don't see len used anywhere?

>   if (dma_addr != 0)
>   goto out_unlock;
>  
> @@ -1212,14 +1215,26 @@ static u64 enable_ddw(struct pci_dev *dev, struct 
> device_node *pdn)
>   }
>   /* verify the window * number of ptes will map the partition */
>   /* check largest block * page size > max memory hotplug addr */
> - max_addr = ddw_memory_hotplug_max();
> - if (query.largest_available_block < (max_addr >> page_shift)) {
> - dev_dbg(>dev, "can't map partition max 0x%llx with %llu "
> -   "%llu-sized pages\n", max_addr,  
> query.largest_available_block,
> -   1ULL << page_shift);
> + /*
> +  * The "ibm,pmemory" can appear anywhere in the address space.
> +  * Assuming it is still backed by page structs, try MAX_PHYSMEM_BITS
> +  * for the upper limit and fallback to max RAM otherwise but this
> +  * disables device::dma_ops_bypass.
> +  */
> + len = max_ram_len;

Here you override whatever find_existing_ddw() wrote to len?

> + if (pmem_present) {
> + if (query.largest_available_block >=
> + (1ULL << (MAX_PHYSMEM_BITS - page_shift)))
> + len = MAX_PHYSMEM_BITS - page_shift;
> + else
> + dev_info(>dev, "Skipping ibm,pmemory");
> + }
> +
> + if (query.largest_available_block < (1ULL << (len - page_shift))) {
> + dev_dbg(>dev, "can't map partition max 0x%llx with %llu 
> %llu-sized pages\n",
> + 1ULL << len, query.largest_available_block, 1ULL << 
> page_shift);
>   goto out_failed;
>   }
> - len = order_base_2(max_addr);
>   win64 = kzalloc(sizeof(struct property), GFP_KERNEL);
>   if (!win64) {
>   dev_info(>dev,


cheers


Re: [PATCH] powerpc/smp: Move rcu_cpu_starting() earlier

2020-10-28 Thread Paul E. McKenney
On Thu, Oct 29, 2020 at 11:09:07AM +1100, Michael Ellerman wrote:
> Qian Cai  writes:
> > The call to rcu_cpu_starting() in start_secondary() is not early enough
> > in the CPU-hotplug onlining process, which results in lockdep splats as
> > follows:
> 
> Since when?
> What kernel version?
> 
> I haven't seen this running CPU hotplug tests with PROVE_LOCKING=y on
> v5.10-rc1. Am I missing a CONFIG?

My guess would be that adding CONFIG_PROVE_RAW_LOCK_NESTING=y will
get you some splats.

Thanx, Paul

> cheers
> 
> 
> >  WARNING: suspicious RCU usage
> >  -
> >  kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!!
> >
> >  other info that might help us debug this:
> >
> >  RCU used illegally from offline CPU!
> >  rcu_scheduler_active = 1, debug_locks = 1
> >  no locks held by swapper/1/0.
> >
> >  Call Trace:
> >  dump_stack+0xec/0x144 (unreliable)
> >  lockdep_rcu_suspicious+0x128/0x14c
> >  __lock_acquire+0x1060/0x1c60
> >  lock_acquire+0x140/0x5f0
> >  _raw_spin_lock_irqsave+0x64/0xb0
> >  clockevents_register_device+0x74/0x270
> >  register_decrementer_clockevent+0x94/0x110
> >  start_secondary+0x134/0x800
> >  start_secondary_prolog+0x10/0x14
> >
> > This is avoided by moving the call to rcu_cpu_starting up near the
> > beginning of the start_secondary() function. Note that the
> > raw_smp_processor_id() is required in order to avoid calling into
> > lockdep before RCU has declared the CPU to be watched for readers.
> >
> > Link: 
> > https://lore.kernel.org/lkml/160223032121.7002.1269740091547117869.tip-bot2@tip-bot2/
> > Signed-off-by: Qian Cai 
> > ---
> >  arch/powerpc/kernel/smp.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> > index 3c6b9822f978..8c2857cbd960 100644
> > --- a/arch/powerpc/kernel/smp.c
> > +++ b/arch/powerpc/kernel/smp.c
> > @@ -1393,13 +1393,14 @@ static void add_cpu_to_masks(int cpu)
> >  /* Activate a secondary processor. */
> >  void start_secondary(void *unused)
> >  {
> > -   unsigned int cpu = smp_processor_id();
> > +   unsigned int cpu = raw_smp_processor_id();
> >  
> > mmgrab(_mm);
> > current->active_mm = _mm;
> >  
> > smp_store_cpu_info(cpu);
> > set_dec(tb_ticks_per_jiffy);
> > +   rcu_cpu_starting(cpu);
> > preempt_disable();
> > cpu_callin_map[cpu] = 1;
> >  
> > -- 
> > 2.28.0


Re: [PATCH 01/13] PCI: dwc/imx6: Drop setting PCI_MSI_FLAGS_ENABLE

2020-10-28 Thread Michael Ellerman
Rob Herring  writes:
> No other host driver sets the PCI_MSI_FLAGS_ENABLE bit, so it must not
> be necessary. If it is, a comment is needed.

Yeah, but git blame directly points to:

  75cb8d20c112 ("PCI: imx: Enable MSI from downstream components")

Which has a pretty long explanation. The relevant bit probably being:

  ... on i.MX6, the MSI Enable bit controls delivery of MSI interrupts
  from components below the Root Port.


So it seems a little rash to just remove the code.

cheers


> Cc: Richard Zhu 
> Cc: Lucas Stach 
> Cc: Lorenzo Pieralisi 
> Cc: Bjorn Helgaas 
> Cc: Shawn Guo 
> Cc: Sascha Hauer 
> Cc: Pengutronix Kernel Team 
> Cc: Fabio Estevam 
> Cc: NXP Linux Team 
> Signed-off-by: Rob Herring 
> ---
>  drivers/pci/controller/dwc/pci-imx6.c | 8 
>  1 file changed, 8 deletions(-)
>
> diff --git a/drivers/pci/controller/dwc/pci-imx6.c 
> b/drivers/pci/controller/dwc/pci-imx6.c
> index 5cf1ef12fb9b..7dd137d62dca 100644
> --- a/drivers/pci/controller/dwc/pci-imx6.c
> +++ b/drivers/pci/controller/dwc/pci-imx6.c
> @@ -1002,7 +1002,6 @@ static int imx6_pcie_probe(struct platform_device *pdev)
>   struct resource *dbi_base;
>   struct device_node *node = dev->of_node;
>   int ret;
> - u16 val;
>  
>   imx6_pcie = devm_kzalloc(dev, sizeof(*imx6_pcie), GFP_KERNEL);
>   if (!imx6_pcie)
> @@ -1167,13 +1166,6 @@ static int imx6_pcie_probe(struct platform_device 
> *pdev)
>   if (ret < 0)
>   return ret;
>  
> - if (pci_msi_enabled()) {
> - u8 offset = dw_pcie_find_capability(pci, PCI_CAP_ID_MSI);
> - val = dw_pcie_readw_dbi(pci, offset + PCI_MSI_FLAGS);
> - val |= PCI_MSI_FLAGS_ENABLE;
> - dw_pcie_writew_dbi(pci, offset + PCI_MSI_FLAGS, val);
> - }
> -
>   return 0;
>  }
>  
> -- 
> 2.25.1


Re: [PATCH] powerpc/smp: Move rcu_cpu_starting() earlier

2020-10-28 Thread Michael Ellerman
Qian Cai  writes:
> The call to rcu_cpu_starting() in start_secondary() is not early enough
> in the CPU-hotplug onlining process, which results in lockdep splats as
> follows:

Since when?
What kernel version?

I haven't seen this running CPU hotplug tests with PROVE_LOCKING=y on
v5.10-rc1. Am I missing a CONFIG?

cheers


>  WARNING: suspicious RCU usage
>  -
>  kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!!
>
>  other info that might help us debug this:
>
>  RCU used illegally from offline CPU!
>  rcu_scheduler_active = 1, debug_locks = 1
>  no locks held by swapper/1/0.
>
>  Call Trace:
>  dump_stack+0xec/0x144 (unreliable)
>  lockdep_rcu_suspicious+0x128/0x14c
>  __lock_acquire+0x1060/0x1c60
>  lock_acquire+0x140/0x5f0
>  _raw_spin_lock_irqsave+0x64/0xb0
>  clockevents_register_device+0x74/0x270
>  register_decrementer_clockevent+0x94/0x110
>  start_secondary+0x134/0x800
>  start_secondary_prolog+0x10/0x14
>
> This is avoided by moving the call to rcu_cpu_starting up near the
> beginning of the start_secondary() function. Note that the
> raw_smp_processor_id() is required in order to avoid calling into
> lockdep before RCU has declared the CPU to be watched for readers.
>
> Link: 
> https://lore.kernel.org/lkml/160223032121.7002.1269740091547117869.tip-bot2@tip-bot2/
> Signed-off-by: Qian Cai 
> ---
>  arch/powerpc/kernel/smp.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 3c6b9822f978..8c2857cbd960 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1393,13 +1393,14 @@ static void add_cpu_to_masks(int cpu)
>  /* Activate a secondary processor. */
>  void start_secondary(void *unused)
>  {
> - unsigned int cpu = smp_processor_id();
> + unsigned int cpu = raw_smp_processor_id();
>  
>   mmgrab(_mm);
>   current->active_mm = _mm;
>  
>   smp_store_cpu_info(cpu);
>   set_dec(tb_ticks_per_jiffy);
> + rcu_cpu_starting(cpu);
>   preempt_disable();
>   cpu_callin_map[cpu] = 1;
>  
> -- 
> 2.28.0


Re: [PATCH 0/4] Powerpc: Better preemption for shared processor

2020-10-28 Thread Waiman Long

On 10/28/20 8:35 AM, Srikar Dronamraju wrote:

Currently, vcpu_is_preempted will return the yield_count for
shared_processor. On a PowerVM LPAR, Phyp schedules at SMT8 core boundary
i.e all CPUs belonging to a core are either group scheduled in or group
scheduled out. This can be used to better predict non-preempted CPUs on
PowerVM shared LPARs.

perf stat -r 5 -a perf bench sched pipe -l 1000 (lesser time is better)

powerpc/next
  35,107,951.20 msec cpu-clock #  255.898 CPUs utilized 
   ( +-  0.31% )
 23,655,348  context-switches  #0.674 K/sec 
   ( +-  3.72% )
 14,465  cpu-migrations#0.000 K/sec 
   ( +-  5.37% )
 82,463  page-faults   #0.002 K/sec 
   ( +-  8.40% )
  1,127,182,328,206  cycles#0.032 GHz   
   ( +-  1.60% )  (66.67%)
 78,587,300,622  stalled-cycles-frontend   #6.97% frontend cycles 
idle ( +-  0.08% )  (50.01%)
654,124,218,432  stalled-cycles-backend#   58.03% backend cycles 
idle  ( +-  1.74% )  (50.01%)
834,013,059,242  instructions  #0.74  insn per cycle
   #0.78  stalled cycles 
per insn  ( +-  0.73% )  (66.67%)
132,911,454,387  branches  #3.786 M/sec 
   ( +-  0.59% )  (50.00%)
  2,890,882,143  branch-misses #2.18% of all branches   
   ( +-  0.46% )  (50.00%)

137.195 +- 0.419 seconds time elapsed  ( +-  0.31% )

powerpc/next + patchset
  29,981,702.64 msec cpu-clock #  255.881 CPUs utilized 
   ( +-  1.30% )
 40,162,456  context-switches  #0.001 M/sec 
   ( +-  0.01% )
  1,110  cpu-migrations#0.000 K/sec 
   ( +-  5.20% )
 62,616  page-faults   #0.002 K/sec 
   ( +-  3.93% )
  1,430,030,626,037  cycles#0.048 GHz   
   ( +-  1.41% )  (66.67%)
 83,202,707,288  stalled-cycles-frontend   #5.82% frontend cycles 
idle ( +-  0.75% )  (50.01%)
744,556,088,520  stalled-cycles-backend#   52.07% backend cycles 
idle  ( +-  1.39% )  (50.01%)
940,138,418,674  instructions  #0.66  insn per cycle
   #0.79  stalled cycles 
per insn  ( +-  0.51% )  (66.67%)
146,452,852,283  branches  #4.885 M/sec 
   ( +-  0.80% )  (50.00%)
  3,237,743,996  branch-misses #2.21% of all branches   
   ( +-  1.18% )  (50.01%)

 117.17 +- 1.52 seconds time elapsed  ( +-  1.30% )

This is around 14.6% improvement in performance.

Cc: linuxppc-dev 
Cc: LKML 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Nathan Lynch 
Cc: Gautham R Shenoy 
Cc: Peter Zijlstra 
Cc: Valentin Schneider 
Cc: Juri Lelli 
Cc: Waiman Long 
Cc: Phil Auld 

Srikar Dronamraju (4):
   powerpc: Refactor is_kvm_guest declaration to new header
   powerpc: Rename is_kvm_guest to check_kvm_guest
   powerpc: Reintroduce is_kvm_guest
   powerpc/paravirt: Use is_kvm_guest in vcpu_is_preempted

  arch/powerpc/include/asm/firmware.h  |  6 --
  arch/powerpc/include/asm/kvm_guest.h | 25 +
  arch/powerpc/include/asm/kvm_para.h  |  2 +-
  arch/powerpc/include/asm/paravirt.h  | 18 ++
  arch/powerpc/kernel/firmware.c   |  5 -
  arch/powerpc/platforms/pseries/smp.c |  3 ++-
  6 files changed, 50 insertions(+), 9 deletions(-)
  create mode 100644 arch/powerpc/include/asm/kvm_guest.h


This patch series looks good to me and the performance is nice too.

Acked-by: Waiman Long 

Just curious, is the performance mainly from the use of static_branch 
(patches 1 - 3) or from reducing call to yield_count_of().


Cheers,
Longman



Re: [PATCH kernel v3 1/2] dma: Allow mixing bypass and mapped DMA operation

2020-10-28 Thread Alexey Kardashevskiy




On 29/10/2020 04:22, Christoph Hellwig wrote:

On Wed, Oct 28, 2020 at 06:00:29PM +1100, Alexey Kardashevskiy wrote:

At the moment we allow bypassing DMA ops only when we can do this for
the entire RAM. However there are configs with mixed type memory
where we could still allow bypassing IOMMU in most cases;
POWERPC with persistent memory is one example.

This adds an arch hook to determine where bypass can still work and
we invoke direct DMA API. The following patch checks the bus limit
on POWERPC to allow or disallow direct mapping.

This adds a CONFIG_ARCH_HAS_DMA_SET_MASK config option to make arch_
hooks no-op by default.

Signed-off-by: Alexey Kardashevskiy 
---
  kernel/dma/mapping.c | 24 
  kernel/dma/Kconfig   |  4 
  2 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 51bb8fa8eb89..a0bc9eb876ed 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -137,6 +137,18 @@ static inline bool dma_map_direct(struct device *dev,
return dma_go_direct(dev, *dev->dma_mask, ops);
  }
  
+#ifdef CONFIG_ARCH_HAS_DMA_MAP_DIRECT

+bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr);
+bool arch_dma_unmap_page_direct(struct device *dev, dma_addr_t dma_handle);
+bool arch_dma_map_sg_direct(struct device *dev, struct scatterlist *sg, int 
nents);
+bool arch_dma_unmap_sg_direct(struct device *dev, struct scatterlist *sg, int 
nents);
+#else
+#define arch_dma_map_page_direct(d, a) (0)
+#define arch_dma_unmap_page_direct(d, a) (0)
+#define arch_dma_map_sg_direct(d, s, n) (0)
+#define arch_dma_unmap_sg_direct(d, s, n) (0)
+#endif


A bunch of overly long lines here.  Except for that this looks ok to me.
If you want me to queue up the series I can just fix it up.


I thought 100 is the new limit since 
https://lkml.org/lkml/2020/5/29/1038 (yeah that mentioned some Christoph 
:) ) and having these multiline does not make a huge difference but feel 
free fixing them up.


Are you going to take both patches? Do you need mpe's ack? Thanks,


--
Alexey


Re: [PATCH kernel v2 1/2] dma: Allow mixing bypass and normal IOMMU operation

2020-10-28 Thread Alexey Kardashevskiy




On 29/10/2020 04:21, Christoph Hellwig wrote:

On Wed, Oct 28, 2020 at 05:55:23PM +1100, Alexey Kardashevskiy wrote:


It is passing an address of the end of the mapped area so passing a page
struct means passing page and offset which is an extra parameter and we do
not want to do anything with the page in those hooks anyway so I'd keep it
as is.



and
 maybe even hide the dma_map_direct inside it.


Call dma_map_direct() from arch_dma_map_page_direct() if
arch_dma_map_page_direct() is defined? Seems suboptimal as it is going to
be bypass=true in most cases and we save one call by avoiding calling
arch_dma_map_page_direct(). Unless I missed something?


C does not even evaluate the right hand side of a || expression if the
left hand evaluates to true.


Right, this is what I meant. dma_map_direct() is inline and fast so I 
did not want it inside the arch hook which is not inline.



--
Alexey


Re: [PATCH 2/4] PM: hibernate: improve robustness of mapping pages in the direct map

2020-10-28 Thread Edgecombe, Rick P
On Sun, 2020-10-25 at 12:15 +0200, Mike Rapoport wrote:
> +   if (IS_ENABLED(CONFIG_ARCH_HAS_SET_DIRECT_MAP)) {
> +   unsigned long addr = (unsigned
> long)page_address(page);
> +   int ret;
> +
> +   if (enable)
> +   ret = set_direct_map_default_noflush(page);
> +   else
> +   ret = set_direct_map_invalid_noflush(page);
> +
> +   if (WARN_ON(ret))
> +   return;
> +
> +   flush_tlb_kernel_range(addr, addr + PAGE_SIZE);
> +   } else {
> +   debug_pagealloc_map_pages(page, 1, enable);
> +   }

Looking at the arm side again, I think this might actually introduce a
regression for the arm/hibernate/DEBUG_PAGEALLOC combo.

Unlike __kernel_map_pages(), it looks like arm's cpa will always bail
in the set_direct_map_() functions if rodata_full is false. So if
rodata_full was disabled but debug page alloc is on, then this would
now skip remapping the pages. I guess the significance depends on
whether hibernate could actually try to save any DEBUG_PAGEALLOC
unmapped pages. Looks like it to me though.



Re: [PATCH 0/4] arch, mm: improve robustness of direct map manipulation

2020-10-28 Thread Edgecombe, Rick P
On Wed, 2020-10-28 at 13:30 +0200, Mike Rapoport wrote:
> On Wed, Oct 28, 2020 at 11:20:12AM +, Will Deacon wrote:
> > On Tue, Oct 27, 2020 at 10:38:16AM +0200, Mike Rapoport wrote:
> > > On Mon, Oct 26, 2020 at 06:05:30PM +, Edgecombe, Rick P
> > > wrote:
> > > > On Mon, 2020-10-26 at 11:05 +0200, Mike Rapoport wrote:
> > > > > On Mon, Oct 26, 2020 at 01:13:52AM +, Edgecombe, Rick P
> > > > > wrote:
> > > > > > On Sun, 2020-10-25 at 12:15 +0200, Mike Rapoport wrote:
> > > > > > > Indeed, for architectures that define
> > > > > > > CONFIG_ARCH_HAS_SET_DIRECT_MAP
> > > > > > > it is
> > > > > > > possible that __kernel_map_pages() would fail, but since
> > > > > > > this
> > > > > > > function is
> > > > > > > void, the failure will go unnoticed.
> > > > > > 
> > > > > > Could you elaborate on how this could happen? Do you mean
> > > > > > during
> > > > > > runtime today or if something new was introduced?
> > > > > 
> > > > > A failure in__kernel_map_pages() may happen today. For
> > > > > instance, on
> > > > > x86
> > > > > if the kernel is built with DEBUG_PAGEALLOC.
> > > > > 
> > > > > __kernel_map_pages(page, 1, 0);
> > > > > 
> > > > > will need to split, say, 2M page and during the split an
> > > > > allocation
> > > > > of
> > > > > page table could fail.
> > > > 
> > > > On x86 at least, DEBUG_PAGEALLOC expects to never have to break
> > > > a page
> > > > on the direct map and even disables locking in cpa because it
> > > > assumes
> > > > this. If this is happening somehow anyway then we should
> > > > probably fix
> > > > that. Even if it's a debug feature, it will not be as useful if
> > > > it is
> > > > causing its own crashes.
> > > > 
> > > > I'm still wondering if there is something I'm missing here. It
> > > > seems
> > > > like you are saying there is a bug in some arch's, so let's add
> > > > a WARN
> > > > in cross-arch code to log it as it crashes. A warn and making
> > > > things
> > > > clearer seem like good ideas, but if there is a bug we should
> > > > fix it.
> > > > The code around the callers still functionally assume re-
> > > > mapping can't
> > > > fail.
> > > 
> > > Oh, I've meant x86 kernel *without* DEBUG_PAGEALLOC, and indeed
> > > the call
> > > that unmaps pages back in safe_copy_page will just reset a 4K
> > > page to
> > > NP because whatever made it NP at the first place already did the
> > > split.
> > > 
> > > Still, on arm64 with DEBUG_PAGEALLOC=n there is a possibility of
> > > a race
> > > between map/unmap dance in __vunmap() and safe_copy_page() that
> > > may
> > > cause access to unmapped memory:
> > > 
> > > __vunmap()
> > > vm_remove_mappings()
> > > set_direct_map_invalid()
> > >   safe_copy_page()
> > >   __kernel_map_pages()
> > >   return
> > >   do_copy_page() -> fault
> > >   
> > > This is a theoretical bug, but it is still not nice :)
> > >   
> > 
> > Just to clarify: this patch series fixes this problem, right?
> 
> Yes.
> 

Well, now I'm confused again.

As David pointed, __vunmap() should not be executing simultaneously
with the hibernate operation because hibernate can't snapshot while
data it needs to save is still updating. If a thread was paused when a
page was in an "invalid" state, it should be remapped by hibernate
before the copy.

To level set, before reading this mail, my takeaways from the
discussions on potential hibernate/debug page alloc problems were:

Potential RISC-V issue:
Doesn't have hibernate support

Potential ARM issue:
The logic around when it's cpa determines pages might be unmapped looks
correct for current callers.

Potential x86 page break issue:
Seems to be ok for now, but a new set_memory_np() caller could violate
assumptions in hibernate.

Non-obvious thorny logic: 
General agreement it would be good to separate dependencies.

Behavior of V1 of this patchset:
No functional change other than addition of a warn in hibernate.


So "does this fix the problem", "yes" leaves me a bit confused... Not
saying there couldn't be any problems, especially due to the thorniness
and cross arch stride, but what is it exactly and how does this series
fix it?



Re: [PATCH] powerpc/smp: Move rcu_cpu_starting() earlier

2020-10-28 Thread Paul E. McKenney
On Wed, Oct 28, 2020 at 02:23:34PM -0400, Qian Cai wrote:
> The call to rcu_cpu_starting() in start_secondary() is not early enough
> in the CPU-hotplug onlining process, which results in lockdep splats as
> follows:
> 
>  WARNING: suspicious RCU usage
>  -
>  kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!!
> 
>  other info that might help us debug this:
> 
>  RCU used illegally from offline CPU!
>  rcu_scheduler_active = 1, debug_locks = 1
>  no locks held by swapper/1/0.
> 
>  Call Trace:
>  dump_stack+0xec/0x144 (unreliable)
>  lockdep_rcu_suspicious+0x128/0x14c
>  __lock_acquire+0x1060/0x1c60
>  lock_acquire+0x140/0x5f0
>  _raw_spin_lock_irqsave+0x64/0xb0
>  clockevents_register_device+0x74/0x270
>  register_decrementer_clockevent+0x94/0x110
>  start_secondary+0x134/0x800
>  start_secondary_prolog+0x10/0x14
> 
> This is avoided by moving the call to rcu_cpu_starting up near the
> beginning of the start_secondary() function. Note that the
> raw_smp_processor_id() is required in order to avoid calling into
> lockdep before RCU has declared the CPU to be watched for readers.
> 
> Link: 
> https://lore.kernel.org/lkml/160223032121.7002.1269740091547117869.tip-bot2@tip-bot2/
> Signed-off-by: Qian Cai 

Acked-by: Paul E. McKenney 

> ---
>  arch/powerpc/kernel/smp.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 3c6b9822f978..8c2857cbd960 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -1393,13 +1393,14 @@ static void add_cpu_to_masks(int cpu)
>  /* Activate a secondary processor. */
>  void start_secondary(void *unused)
>  {
> - unsigned int cpu = smp_processor_id();
> + unsigned int cpu = raw_smp_processor_id();
>  
>   mmgrab(_mm);
>   current->active_mm = _mm;
>  
>   smp_store_cpu_info(cpu);
>   set_dec(tb_ticks_per_jiffy);
> + rcu_cpu_starting(cpu);
>   preempt_disable();
>   cpu_callin_map[cpu] = 1;
>  
> -- 
> 2.28.0
> 


Re: [PATCH 20/33] docs: ABI: testing: make the files compatible with ReST output

2020-10-28 Thread Richard Cochran
On Wed, Oct 28, 2020 at 03:23:18PM +0100, Mauro Carvalho Chehab wrote:

> diff --git a/Documentation/ABI/testing/sysfs-uevent 
> b/Documentation/ABI/testing/sysfs-uevent
> index aa39f8d7bcdf..d0893dad3f38 100644
> --- a/Documentation/ABI/testing/sysfs-uevent
> +++ b/Documentation/ABI/testing/sysfs-uevent
> @@ -19,7 +19,8 @@ Description:
>  a transaction identifier so it's possible to use the same 
> UUID
>  value for one or more synthetic uevents in which case we
>  logically group these uevents together for any userspace
> -listeners. The UUID value appears in uevent as
> +listeners. The UUID value appears in uevent as:

I know almost nothing about Sphinx, but why have one colon here ^^^ and ...

> +
>  "SYNTH_UUID=----" environment
>  variable.
>  
> @@ -30,18 +31,19 @@ Description:
>  It's possible to define zero or more pairs - each pair is 
> then
>  delimited by a space character ' '. Each pair appears in
>  synthetic uevent as "SYNTH_ARG_KEY=VALUE". That means the KEY
> -name gains "SYNTH_ARG_" prefix to avoid possible collisions
> +name gains `SYNTH_ARG_` prefix to avoid possible collisions
>  with existing variables.
>  
> -Example of valid sequence written to the uevent file:
> +Example of valid sequence written to the uevent file::

... two here?

Thanks,
Richard


[PATCH 13/13] PCI: dwc: Remove unnecessary wrappers around dw_pcie_host_init()

2020-10-28 Thread Rob Herring
Many calls to dw_pcie_host_init() are in a wrapper function with
nothing else now. Let's remove the pointless extra layer.

Cc: Richard Zhu 
Cc: Lucas Stach 
Cc: Lorenzo Pieralisi 
Cc: Bjorn Helgaas 
Cc: Shawn Guo 
Cc: Sascha Hauer 
Cc: Pengutronix Kernel Team 
Cc: Fabio Estevam 
Cc: NXP Linux Team 
Cc: Murali Karicheri 
Cc: Minghuan Lian 
Cc: Mingkai Hu 
Cc: Roy Zang 
Cc: Yue Wang 
Cc: Kevin Hilman 
Cc: Neil Armstrong 
Cc: Jerome Brunet 
Cc: Martin Blumenstingl 
Cc: Jonathan Chocron 
Cc: Jesper Nilsson 
Cc: Xiaowei Song 
Cc: Binghui Wang 
Cc: Kunihiko Hayashi 
Cc: Masahiro Yamada 
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-amlo...@lists.infradead.org
Cc: linux-arm-ker...@axis.com
Signed-off-by: Rob Herring 
---
 drivers/pci/controller/dwc/pci-imx6.c   | 26 ++---
 drivers/pci/controller/dwc/pci-keystone.c   | 19 +--
 drivers/pci/controller/dwc/pci-layerscape.c | 26 ++---
 drivers/pci/controller/dwc/pci-meson.c  | 22 ++---
 drivers/pci/controller/dwc/pcie-al.c| 20 ++--
 drivers/pci/controller/dwc/pcie-artpec6.c   | 23 +++---
 drivers/pci/controller/dwc/pcie-kirin.c | 11 ++---
 drivers/pci/controller/dwc/pcie-uniphier.c  | 23 +++---
 8 files changed, 17 insertions(+), 153 deletions(-)

diff --git a/drivers/pci/controller/dwc/pci-imx6.c 
b/drivers/pci/controller/dwc/pci-imx6.c
index 4ba0b1195ecf..73e5cfc0725a 100644
--- a/drivers/pci/controller/dwc/pci-imx6.c
+++ b/drivers/pci/controller/dwc/pci-imx6.c
@@ -842,25 +842,6 @@ static const struct dw_pcie_host_ops imx6_pcie_host_ops = {
.host_init = imx6_pcie_host_init,
 };
 
-static int imx6_add_pcie_port(struct imx6_pcie *imx6_pcie,
- struct platform_device *pdev)
-{
-   struct dw_pcie *pci = imx6_pcie->pci;
-   struct pcie_port *pp = >pp;
-   struct device *dev = >dev;
-   int ret;
-
-   pp->ops = _pcie_host_ops;
-
-   ret = dw_pcie_host_init(pp);
-   if (ret) {
-   dev_err(dev, "failed to initialize host\n");
-   return ret;
-   }
-
-   return 0;
-}
-
 static const struct dw_pcie_ops dw_pcie_ops = {
.start_link = imx6_pcie_start_link,
 };
@@ -1004,6 +985,7 @@ static int imx6_pcie_probe(struct platform_device *pdev)
 
pci->dev = dev;
pci->ops = _pcie_ops;
+   pci->pp.ops = _pcie_host_ops;
 
imx6_pcie->pci = pci;
imx6_pcie->drvdata = of_device_get_match_data(dev);
@@ -1153,11 +1135,7 @@ static int imx6_pcie_probe(struct platform_device *pdev)
if (ret)
return ret;
 
-   ret = imx6_add_pcie_port(imx6_pcie, pdev);
-   if (ret < 0)
-   return ret;
-
-   return 0;
+   return dw_pcie_host_init(>pp);
 }
 
 static void imx6_pcie_shutdown(struct platform_device *pdev)
diff --git a/drivers/pci/controller/dwc/pci-keystone.c 
b/drivers/pci/controller/dwc/pci-keystone.c
index 5a4bcc2b1ddb..719756160821 100644
--- a/drivers/pci/controller/dwc/pci-keystone.c
+++ b/drivers/pci/controller/dwc/pci-keystone.c
@@ -844,23 +844,6 @@ static irqreturn_t ks_pcie_err_irq_handler(int irq, void 
*priv)
return ks_pcie_handle_error_irq(ks_pcie);
 }
 
-static int __init ks_pcie_add_pcie_port(struct keystone_pcie *ks_pcie,
-   struct platform_device *pdev)
-{
-   struct dw_pcie *pci = ks_pcie->pci;
-   struct pcie_port *pp = >pp;
-   struct device *dev = >dev;
-   int ret;
-
-   ret = dw_pcie_host_init(pp);
-   if (ret) {
-   dev_err(dev, "failed to initialize host\n");
-   return ret;
-   }
-
-   return 0;
-}
-
 static void ks_pcie_am654_write_dbi2(struct dw_pcie *pci, void __iomem *base,
 u32 reg, size_t size, u32 val)
 {
@@ -1255,7 +1238,7 @@ static int __init ks_pcie_probe(struct platform_device 
*pdev)
}
 
pci->pp.ops = host_ops;
-   ret = ks_pcie_add_pcie_port(ks_pcie, pdev);
+   ret = dw_pcie_host_init(>pp);
if (ret < 0)
goto err_get_sync;
break;
diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
b/drivers/pci/controller/dwc/pci-layerscape.c
index 400ebbebd00f..44ad34cdc3bc 100644
--- a/drivers/pci/controller/dwc/pci-layerscape.c
+++ b/drivers/pci/controller/dwc/pci-layerscape.c
@@ -232,31 +232,12 @@ static const struct of_device_id ls_pcie_of_match[] = {
{ },
 };
 
-static int __init ls_add_pcie_port(struct ls_pcie *pcie)
-{
-   struct dw_pcie *pci = pcie->pci;
-   struct pcie_port *pp = >pp;
-   struct device *dev = pci->dev;
-   int ret;
-
-   pp->ops = pcie->drvdata->ops;
-
-   ret = dw_pcie_host_init(pp);
-   if (ret) {
-   dev_err(dev, "failed to initialize host\n");
-   return ret;
-   }
-
-   return 0;
-}
-
 static int __init ls_pcie_probe(struct 

[PATCH 12/13] PCI: dwc: Move dw_pcie_setup_rc() to DWC common code

2020-10-28 Thread Rob Herring
All RC complex drivers must call dw_pcie_setup_rc(). The ordering of the
call shouldn't be too important other than being after any RC resets.

There's a few calls of dw_pcie_setup_rc() left as drivers implementing
suspend/resume need it.

Cc: Kishon Vijay Abraham I 
Cc: Lorenzo Pieralisi 
Cc: Bjorn Helgaas 
Cc: Jingoo Han 
Cc: Kukjin Kim 
Cc: Krzysztof Kozlowski 
Cc: Richard Zhu 
Cc: Lucas Stach 
Cc: Shawn Guo 
Cc: Sascha Hauer 
Cc: Pengutronix Kernel Team 
Cc: Fabio Estevam 
Cc: NXP Linux Team 
Cc: Murali Karicheri 
Cc: Minghuan Lian 
Cc: Mingkai Hu 
Cc: Roy Zang 
Cc: Yue Wang 
Cc: Kevin Hilman 
Cc: Neil Armstrong 
Cc: Jerome Brunet 
Cc: Martin Blumenstingl 
Cc: Thomas Petazzoni 
Cc: Jesper Nilsson 
Cc: Gustavo Pimentel 
Cc: Xiaowei Song 
Cc: Binghui Wang 
Cc: Andy Gross 
Cc: Bjorn Andersson 
Cc: Stanimir Varbanov 
Cc: Pratyush Anand 
Cc: Kunihiko Hayashi 
Cc: Masahiro Yamada 
Cc: linux-o...@vger.kernel.org
Cc: linux-samsung-...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-amlo...@lists.infradead.org
Cc: linux-arm-ker...@axis.com
Cc: linux-arm-...@vger.kernel.org
Signed-off-by: Rob Herring 
---
 drivers/pci/controller/dwc/pci-dra7xx.c   | 1 -
 drivers/pci/controller/dwc/pci-exynos.c   | 1 -
 drivers/pci/controller/dwc/pci-imx6.c | 1 -
 drivers/pci/controller/dwc/pci-keystone.c | 2 --
 drivers/pci/controller/dwc/pci-layerscape.c   | 2 --
 drivers/pci/controller/dwc/pci-meson.c| 2 --
 drivers/pci/controller/dwc/pcie-armada8k.c| 2 --
 drivers/pci/controller/dwc/pcie-artpec6.c | 1 -
 drivers/pci/controller/dwc/pcie-designware-host.c | 1 +
 drivers/pci/controller/dwc/pcie-designware-plat.c | 8 
 drivers/pci/controller/dwc/pcie-histb.c   | 3 ---
 drivers/pci/controller/dwc/pcie-kirin.c   | 2 --
 drivers/pci/controller/dwc/pcie-qcom.c| 1 -
 drivers/pci/controller/dwc/pcie-spear13xx.c   | 2 --
 drivers/pci/controller/dwc/pcie-uniphier.c| 2 --
 15 files changed, 1 insertion(+), 30 deletions(-)

diff --git a/drivers/pci/controller/dwc/pci-dra7xx.c 
b/drivers/pci/controller/dwc/pci-dra7xx.c
index 72a5a2bf933b..b105af63854a 100644
--- a/drivers/pci/controller/dwc/pci-dra7xx.c
+++ b/drivers/pci/controller/dwc/pci-dra7xx.c
@@ -181,7 +181,6 @@ static int dra7xx_pcie_host_init(struct pcie_port *pp)
struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
struct dra7xx_pcie *dra7xx = to_dra7xx_pcie(pci);
 
-   dw_pcie_setup_rc(pp);
dra7xx_pcie_enable_interrupts(dra7xx);
 
return 0;
diff --git a/drivers/pci/controller/dwc/pci-exynos.c 
b/drivers/pci/controller/dwc/pci-exynos.c
index 3939fe22e8a2..5c10a5432896 100644
--- a/drivers/pci/controller/dwc/pci-exynos.c
+++ b/drivers/pci/controller/dwc/pci-exynos.c
@@ -372,7 +372,6 @@ static int exynos_pcie_host_init(struct pcie_port *pp)
phy_init(ep->phy);
 
exynos_pcie_deassert_core_reset(ep);
-   dw_pcie_setup_rc(pp);
exynos_pcie_assert_reset(ep);
 
exynos_pcie_enable_interrupts(ep);
diff --git a/drivers/pci/controller/dwc/pci-imx6.c 
b/drivers/pci/controller/dwc/pci-imx6.c
index c808b563486f..4ba0b1195ecf 100644
--- a/drivers/pci/controller/dwc/pci-imx6.c
+++ b/drivers/pci/controller/dwc/pci-imx6.c
@@ -834,7 +834,6 @@ static int imx6_pcie_host_init(struct pcie_port *pp)
imx6_pcie_init_phy(imx6_pcie);
imx6_pcie_deassert_core_reset(imx6_pcie);
imx6_setup_phy_mpll(imx6_pcie);
-   dw_pcie_setup_rc(pp);
 
return 0;
 }
diff --git a/drivers/pci/controller/dwc/pci-keystone.c 
b/drivers/pci/controller/dwc/pci-keystone.c
index 90b222b020a3..5a4bcc2b1ddb 100644
--- a/drivers/pci/controller/dwc/pci-keystone.c
+++ b/drivers/pci/controller/dwc/pci-keystone.c
@@ -807,8 +807,6 @@ static int __init ks_pcie_host_init(struct pcie_port *pp)
if (ret)
return ret;
 
-   dw_pcie_setup_rc(pp);
-
ks_pcie_stop_link(pci);
ks_pcie_setup_rc_app_regs(ks_pcie);
writew(PCI_IO_RANGE_TYPE_32 | (PCI_IO_RANGE_TYPE_32 << 8),
diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
b/drivers/pci/controller/dwc/pci-layerscape.c
index 0d84986c4c16..400ebbebd00f 100644
--- a/drivers/pci/controller/dwc/pci-layerscape.c
+++ b/drivers/pci/controller/dwc/pci-layerscape.c
@@ -136,8 +136,6 @@ static int ls_pcie_host_init(struct pcie_port *pp)
 
ls_pcie_drop_msg_tlp(pcie);
 
-   dw_pcie_setup_rc(pp);
-
return 0;
 }
 
diff --git a/drivers/pci/controller/dwc/pci-meson.c 
b/drivers/pci/controller/dwc/pci-meson.c
index 2df0adcf0bf2..04589f0decb2 100644
--- a/drivers/pci/controller/dwc/pci-meson.c
+++ b/drivers/pci/controller/dwc/pci-meson.c
@@ -380,8 +380,6 @@ static int meson_pcie_host_init(struct pcie_port *pp)
meson_set_max_payload(mp, MAX_PAYLOAD_SIZE);
meson_set_max_rd_req_size(mp, MAX_READ_REQ_SIZE);
 
-   dw_pcie_setup_rc(pp);
-
return 0;
 }
 
diff --git a/drivers/pci/controller/dwc/pcie-armada8k.c 

[PATCH 11/13] PCI: dwc: Move dw_pcie_msi_init() into core

2020-10-28 Thread Rob Herring
The host drivers which call dw_pcie_msi_init() are all the ones using
the built-in MSI controller, so let's move it into the common DWC code.

Cc: Kishon Vijay Abraham I 
Cc: Lorenzo Pieralisi 
Cc: Bjorn Helgaas 
Cc: Jingoo Han 
Cc: Kukjin Kim 
Cc: Krzysztof Kozlowski 
Cc: Richard Zhu 
Cc: Lucas Stach 
Cc: Shawn Guo 
Cc: Sascha Hauer 
Cc: Pengutronix Kernel Team 
Cc: Fabio Estevam 
Cc: NXP Linux Team 
Cc: Yue Wang 
Cc: Kevin Hilman 
Cc: Neil Armstrong 
Cc: Jerome Brunet 
Cc: Martin Blumenstingl 
Cc: Jesper Nilsson 
Cc: Gustavo Pimentel 
Cc: Xiaowei Song 
Cc: Binghui Wang 
Cc: Stanimir Varbanov 
Cc: Andy Gross 
Cc: Bjorn Andersson 
Cc: Pratyush Anand 
Cc: Thierry Reding 
Cc: Jonathan Hunter 
Cc: Kunihiko Hayashi 
Cc: Masahiro Yamada 
Cc: linux-o...@vger.kernel.org
Cc: linux-samsung-...@vger.kernel.org
Cc: linux-amlo...@lists.infradead.org
Cc: linux-arm-ker...@axis.com
Cc: linux-arm-...@vger.kernel.org
Cc: linux-te...@vger.kernel.org
Signed-off-by: Rob Herring 
---
 drivers/pci/controller/dwc/pci-dra7xx.c   |  2 --
 drivers/pci/controller/dwc/pci-exynos.c   |  4 
 drivers/pci/controller/dwc/pci-imx6.c |  1 -
 drivers/pci/controller/dwc/pci-meson.c|  1 -
 drivers/pci/controller/dwc/pcie-artpec6.c |  1 -
 drivers/pci/controller/dwc/pcie-designware-host.c |  8 +---
 drivers/pci/controller/dwc/pcie-designware-plat.c |  1 -
 drivers/pci/controller/dwc/pcie-designware.h  | 10 --
 drivers/pci/controller/dwc/pcie-histb.c   |  2 --
 drivers/pci/controller/dwc/pcie-kirin.c   |  1 -
 drivers/pci/controller/dwc/pcie-qcom.c|  2 --
 drivers/pci/controller/dwc/pcie-spear13xx.c   |  6 +-
 drivers/pci/controller/dwc/pcie-tegra194.c|  2 --
 drivers/pci/controller/dwc/pcie-uniphier.c|  1 -
 14 files changed, 6 insertions(+), 36 deletions(-)

diff --git a/drivers/pci/controller/dwc/pci-dra7xx.c 
b/drivers/pci/controller/dwc/pci-dra7xx.c
index 054423d9646d..72a5a2bf933b 100644
--- a/drivers/pci/controller/dwc/pci-dra7xx.c
+++ b/drivers/pci/controller/dwc/pci-dra7xx.c
@@ -182,8 +182,6 @@ static int dra7xx_pcie_host_init(struct pcie_port *pp)
struct dra7xx_pcie *dra7xx = to_dra7xx_pcie(pci);
 
dw_pcie_setup_rc(pp);
-
-   dw_pcie_msi_init(pp);
dra7xx_pcie_enable_interrupts(dra7xx);
 
return 0;
diff --git a/drivers/pci/controller/dwc/pci-exynos.c 
b/drivers/pci/controller/dwc/pci-exynos.c
index 6498b615c834..3939fe22e8a2 100644
--- a/drivers/pci/controller/dwc/pci-exynos.c
+++ b/drivers/pci/controller/dwc/pci-exynos.c
@@ -273,12 +273,8 @@ static irqreturn_t exynos_pcie_irq_handler(int irq, void 
*arg)
 
 static void exynos_pcie_msi_init(struct exynos_pcie *ep)
 {
-   struct dw_pcie *pci = ep->pci;
-   struct pcie_port *pp = >pp;
u32 val;
 
-   dw_pcie_msi_init(pp);
-
/* enable MSI interrupt */
val = exynos_pcie_readl(ep->mem_res->elbi_base, PCIE_IRQ_EN_LEVEL);
val |= IRQ_MSI_ENABLE;
diff --git a/drivers/pci/controller/dwc/pci-imx6.c 
b/drivers/pci/controller/dwc/pci-imx6.c
index 20e249efb02c..c808b563486f 100644
--- a/drivers/pci/controller/dwc/pci-imx6.c
+++ b/drivers/pci/controller/dwc/pci-imx6.c
@@ -835,7 +835,6 @@ static int imx6_pcie_host_init(struct pcie_port *pp)
imx6_pcie_deassert_core_reset(imx6_pcie);
imx6_setup_phy_mpll(imx6_pcie);
dw_pcie_setup_rc(pp);
-   dw_pcie_msi_init(pp);
 
return 0;
 }
diff --git a/drivers/pci/controller/dwc/pci-meson.c 
b/drivers/pci/controller/dwc/pci-meson.c
index 41a3351b100b..2df0adcf0bf2 100644
--- a/drivers/pci/controller/dwc/pci-meson.c
+++ b/drivers/pci/controller/dwc/pci-meson.c
@@ -381,7 +381,6 @@ static int meson_pcie_host_init(struct pcie_port *pp)
meson_set_max_rd_req_size(mp, MAX_READ_REQ_SIZE);
 
dw_pcie_setup_rc(pp);
-   dw_pcie_msi_init(pp);
 
return 0;
 }
diff --git a/drivers/pci/controller/dwc/pcie-artpec6.c 
b/drivers/pci/controller/dwc/pcie-artpec6.c
index 8b3da3038ac3..7ee8f3c83f8f 100644
--- a/drivers/pci/controller/dwc/pcie-artpec6.c
+++ b/drivers/pci/controller/dwc/pcie-artpec6.c
@@ -329,7 +329,6 @@ static int artpec6_pcie_host_init(struct pcie_port *pp)
artpec6_pcie_deassert_core_reset(artpec6_pcie);
artpec6_pcie_wait_for_phy(artpec6_pcie);
dw_pcie_setup_rc(pp);
-   dw_pcie_msi_init(pp);
 
return 0;
 }
diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c 
b/drivers/pci/controller/dwc/pcie-designware-host.c
index f5f9d4e58aa3..025514e00a42 100644
--- a/drivers/pci/controller/dwc/pcie-designware-host.c
+++ b/drivers/pci/controller/dwc/pcie-designware-host.c
@@ -256,7 +256,7 @@ int dw_pcie_allocate_domains(struct pcie_port *pp)
return 0;
 }
 
-void dw_pcie_free_msi(struct pcie_port *pp)
+static void dw_pcie_free_msi(struct pcie_port *pp)
 {
if (pp->msi_irq) {
irq_set_chained_handler(pp->msi_irq, NULL);
@@ -275,12 +275,12 @@ void 

[PATCH 10/13] PCI: dwc: Move link handling into common code

2020-10-28 Thread Rob Herring
All the DWC drivers do link setup and checks at roughly the same time.
Let's use the existing .start_link() hook (currently only used in EP
mode) and move the link handling to the core code.

The behavior for a link down was inconsistent as some drivers would fail
probe in that case while others succeed. Let's standardize this to
succeed as there are usecases where devices (and the link) appear later
even without hotplug. For example, a reconfigured FPGA device.

Cc: Kishon Vijay Abraham I 
Cc: Lorenzo Pieralisi 
Cc: Bjorn Helgaas 
Cc: Jingoo Han 
Cc: Kukjin Kim 
Cc: Krzysztof Kozlowski 
Cc: Richard Zhu 
Cc: Lucas Stach 
Cc: Shawn Guo 
Cc: Sascha Hauer 
Cc: Pengutronix Kernel Team 
Cc: Fabio Estevam 
Cc: NXP Linux Team 
Cc: Murali Karicheri 
Cc: Yue Wang 
Cc: Kevin Hilman 
Cc: Neil Armstrong 
Cc: Jerome Brunet 
Cc: Martin Blumenstingl 
Cc: Thomas Petazzoni 
Cc: Jesper Nilsson 
Cc: Gustavo Pimentel 
Cc: Xiaowei Song 
Cc: Binghui Wang 
Cc: Andy Gross 
Cc: Bjorn Andersson 
Cc: Stanimir Varbanov 
Cc: Pratyush Anand 
Cc: Thierry Reding 
Cc: Jonathan Hunter 
Cc: Kunihiko Hayashi 
Cc: Masahiro Yamada 
Cc: linux-o...@vger.kernel.org
Cc: linux-samsung-...@vger.kernel.org
Cc: linux-amlo...@lists.infradead.org
Cc: linux-arm-ker...@axis.com
Cc: linux-arm-...@vger.kernel.org
Cc: linux-te...@vger.kernel.org
Signed-off-by: Rob Herring 
---
 drivers/pci/controller/dwc/pci-dra7xx.c   |  2 -
 drivers/pci/controller/dwc/pci-exynos.c   | 41 +++--
 drivers/pci/controller/dwc/pci-imx6.c |  9 ++--
 drivers/pci/controller/dwc/pci-keystone.c |  9 
 drivers/pci/controller/dwc/pci-meson.c| 24 --
 drivers/pci/controller/dwc/pcie-armada8k.c| 39 +++-
 drivers/pci/controller/dwc/pcie-artpec6.c |  2 -
 .../pci/controller/dwc/pcie-designware-host.c |  9 
 .../pci/controller/dwc/pcie-designware-plat.c |  3 --
 drivers/pci/controller/dwc/pcie-histb.c   | 34 +++---
 drivers/pci/controller/dwc/pcie-kirin.c   | 23 ++
 drivers/pci/controller/dwc/pcie-qcom.c| 19 ++--
 drivers/pci/controller/dwc/pcie-spear13xx.c   | 46 ---
 drivers/pci/controller/dwc/pcie-tegra194.c|  1 -
 drivers/pci/controller/dwc/pcie-uniphier.c| 13 ++
 15 files changed, 103 insertions(+), 171 deletions(-)

diff --git a/drivers/pci/controller/dwc/pci-dra7xx.c 
b/drivers/pci/controller/dwc/pci-dra7xx.c
index 6b75c68dddb5..054423d9646d 100644
--- a/drivers/pci/controller/dwc/pci-dra7xx.c
+++ b/drivers/pci/controller/dwc/pci-dra7xx.c
@@ -183,8 +183,6 @@ static int dra7xx_pcie_host_init(struct pcie_port *pp)
 
dw_pcie_setup_rc(pp);
 
-   dra7xx_pcie_establish_link(pci);
-   dw_pcie_wait_for_link(pci);
dw_pcie_msi_init(pp);
dra7xx_pcie_enable_interrupts(dra7xx);
 
diff --git a/drivers/pci/controller/dwc/pci-exynos.c 
b/drivers/pci/controller/dwc/pci-exynos.c
index 7734394953e5..6498b615c834 100644
--- a/drivers/pci/controller/dwc/pci-exynos.c
+++ b/drivers/pci/controller/dwc/pci-exynos.c
@@ -229,30 +229,9 @@ static void exynos_pcie_assert_reset(struct exynos_pcie 
*ep)
GPIOF_OUT_INIT_HIGH, "RESET");
 }
 
-static int exynos_pcie_establish_link(struct exynos_pcie *ep)
+static int exynos_pcie_start_link(struct dw_pcie *pci)
 {
-   struct dw_pcie *pci = ep->pci;
-   struct pcie_port *pp = >pp;
-   struct device *dev = pci->dev;
-
-   if (dw_pcie_link_up(pci)) {
-   dev_err(dev, "Link already up\n");
-   return 0;
-   }
-
-   exynos_pcie_assert_core_reset(ep);
-
-   phy_reset(ep->phy);
-
-   exynos_pcie_writel(ep->mem_res->elbi_base, 1,
-   PCIE_PWR_RESET);
-
-   phy_power_on(ep->phy);
-   phy_init(ep->phy);
-
-   exynos_pcie_deassert_core_reset(ep);
-   dw_pcie_setup_rc(pp);
-   exynos_pcie_assert_reset(ep);
+   struct exynos_pcie *ep = to_exynos_pcie(pci);
 
/* assert LTSSM enable */
exynos_pcie_writel(ep->mem_res->elbi_base, PCIE_ELBI_LTSSM_ENABLE,
@@ -386,7 +365,20 @@ static int exynos_pcie_host_init(struct pcie_port *pp)
 
pp->bridge->ops = _pci_ops;
 
-   exynos_pcie_establish_link(ep);
+   exynos_pcie_assert_core_reset(ep);
+
+   phy_reset(ep->phy);
+
+   exynos_pcie_writel(ep->mem_res->elbi_base, 1,
+   PCIE_PWR_RESET);
+
+   phy_power_on(ep->phy);
+   phy_init(ep->phy);
+
+   exynos_pcie_deassert_core_reset(ep);
+   dw_pcie_setup_rc(pp);
+   exynos_pcie_assert_reset(ep);
+
exynos_pcie_enable_interrupts(ep);
 
return 0;
@@ -430,6 +422,7 @@ static const struct dw_pcie_ops dw_pcie_ops = {
.read_dbi = exynos_pcie_read_dbi,
.write_dbi = exynos_pcie_write_dbi,
.link_up = exynos_pcie_link_up,
+   .start_link = exynos_pcie_start_link,
 };
 
 static int __init exynos_pcie_probe(struct platform_device *pdev)
diff --git a/drivers/pci/controller/dwc/pci-imx6.c 

[PATCH 09/13] PCI: dwc: Rework MSI initialization

2020-10-28 Thread Rob Herring
There are 3 possible MSI implementations for the DWC host. The first is
using the built-in DWC MSI controller. The 2nd is a custom MSI
controller as part of the PCI host (keystone only). The 3rd is an
external MSI controller (typically GICv3 ITS). Currently, the last 2
are distinguished with a .msi_host_init() hook with the 3rd option using
an empty function. However we can detect the 3rd case with the presence
of 'msi-parent' or 'msi-map' properties, so let's do that instead and
remove the empty functions.

Cc: Murali Karicheri 
Cc: Lorenzo Pieralisi 
Cc: Bjorn Helgaas 
Cc: Minghuan Lian 
Cc: Mingkai Hu 
Cc: Roy Zang 
Cc: Jingoo Han 
Cc: Gustavo Pimentel 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Rob Herring 
---
 drivers/pci/controller/dwc/pci-keystone.c |  9 ---
 drivers/pci/controller/dwc/pci-layerscape.c   | 25 ---
 .../pci/controller/dwc/pcie-designware-host.c | 20 +--
 drivers/pci/controller/dwc/pcie-designware.h  |  1 +
 drivers/pci/controller/dwc/pcie-intel-gw.c|  9 ---
 5 files changed, 13 insertions(+), 51 deletions(-)

diff --git a/drivers/pci/controller/dwc/pci-keystone.c 
b/drivers/pci/controller/dwc/pci-keystone.c
index 9cf14f13798b..784385ae6074 100644
--- a/drivers/pci/controller/dwc/pci-keystone.c
+++ b/drivers/pci/controller/dwc/pci-keystone.c
@@ -272,14 +272,6 @@ static void ks_pcie_handle_legacy_irq(struct keystone_pcie 
*ks_pcie,
ks_pcie_app_writel(ks_pcie, IRQ_EOI, offset);
 }
 
-/*
- * Dummy function so that DW core doesn't configure MSI
- */
-static int ks_pcie_am654_msi_host_init(struct pcie_port *pp)
-{
-   return 0;
-}
-
 static void ks_pcie_enable_error_irq(struct keystone_pcie *ks_pcie)
 {
ks_pcie_app_writel(ks_pcie, ERR_IRQ_ENABLE_SET, ERR_IRQ_ALL);
@@ -854,7 +846,6 @@ static const struct dw_pcie_host_ops ks_pcie_host_ops = {
 
 static const struct dw_pcie_host_ops ks_pcie_am654_host_ops = {
.host_init = ks_pcie_host_init,
-   .msi_host_init = ks_pcie_am654_msi_host_init,
 };
 
 static irqreturn_t ks_pcie_err_irq_handler(int irq, void *priv)
diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
b/drivers/pci/controller/dwc/pci-layerscape.c
index 53e56d54c482..0d84986c4c16 100644
--- a/drivers/pci/controller/dwc/pci-layerscape.c
+++ b/drivers/pci/controller/dwc/pci-layerscape.c
@@ -168,37 +168,12 @@ static int ls1021_pcie_host_init(struct pcie_port *pp)
return ls_pcie_host_init(pp);
 }
 
-static int ls_pcie_msi_host_init(struct pcie_port *pp)
-{
-   struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
-   struct device *dev = pci->dev;
-   struct device_node *np = dev->of_node;
-   struct device_node *msi_node;
-
-   /*
-* The MSI domain is set by the generic of_msi_configure().  This
-* .msi_host_init() function keeps us from doing the default MSI
-* domain setup in dw_pcie_host_init() and also enforces the
-* requirement that "msi-parent" exists.
-*/
-   msi_node = of_parse_phandle(np, "msi-parent", 0);
-   if (!msi_node) {
-   dev_err(dev, "failed to find msi-parent\n");
-   return -EINVAL;
-   }
-
-   of_node_put(msi_node);
-   return 0;
-}
-
 static const struct dw_pcie_host_ops ls1021_pcie_host_ops = {
.host_init = ls1021_pcie_host_init,
-   .msi_host_init = ls_pcie_msi_host_init,
 };
 
 static const struct dw_pcie_host_ops ls_pcie_host_ops = {
.host_init = ls_pcie_host_init,
-   .msi_host_init = ls_pcie_msi_host_init,
 };
 
 static const struct dw_pcie_ops dw_ls1021_pcie_ops = {
diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c 
b/drivers/pci/controller/dwc/pcie-designware-host.c
index 0f77e4d4b385..6cebdd9bbd2e 100644
--- a/drivers/pci/controller/dwc/pcie-designware-host.c
+++ b/drivers/pci/controller/dwc/pcie-designware-host.c
@@ -365,6 +365,10 @@ int dw_pcie_host_init(struct pcie_port *pp)
pci->link_gen = of_pci_get_max_link_speed(np);
 
if (pci_msi_enabled()) {
+   pp->has_msi_ctrl = !(pp->ops->msi_host_init ||
+of_property_read_bool(np, "msi-parent") ||
+of_property_read_bool(np, "msi-map"));
+
if (!pp->num_vectors) {
pp->num_vectors = MSI_DEF_NUM_VECTORS;
} else if (pp->num_vectors > MAX_MSI_IRQS) {
@@ -372,7 +376,11 @@ int dw_pcie_host_init(struct pcie_port *pp)
return -EINVAL;
}
 
-   if (!pp->ops->msi_host_init) {
+   if (pp->ops->msi_host_init) {
+   ret = pp->ops->msi_host_init(pp);
+   if (ret < 0)
+   return ret;
+   } else if (pp->has_msi_ctrl) {
if (!pp->msi_irq) {
pp->msi_irq = platform_get_irq_byname(pdev, 
"msi");
if (pp->msi_irq < 0) {

[PATCH 08/13] PCI: dwc: Move MSI interrupt setup into DWC common code

2020-10-28 Thread Rob Herring
Platforms using the built-in DWC MSI controller all have a dedicated
interrupt with "msi" name or at index 0, so let's move setting up the
interrupt to the common DWC code.

spear13xx and dra7xx are the 2 oddballs with muxed interrupts, so
we need to prevent configuring the MSI interrupt by setting msi_irq
to negative.

Cc: Jingoo Han 
Cc: Lorenzo Pieralisi 
Cc: Bjorn Helgaas 
Cc: Kukjin Kim 
Cc: Krzysztof Kozlowski 
Cc: Richard Zhu 
Cc: Lucas Stach 
Cc: Shawn Guo 
Cc: Sascha Hauer 
Cc: Pengutronix Kernel Team 
Cc: Fabio Estevam 
Cc: NXP Linux Team 
Cc: Yue Wang 
Cc: Kevin Hilman 
Cc: Neil Armstrong 
Cc: Jerome Brunet 
Cc: Martin Blumenstingl 
Cc: Jesper Nilsson 
Cc: Gustavo Pimentel 
Cc: Xiaowei Song 
Cc: Binghui Wang 
Cc: Stanimir Varbanov 
Cc: Andy Gross 
Cc: Bjorn Andersson 
Cc: Pratyush Anand 
Cc: Thierry Reding 
Cc: Jonathan Hunter 
Cc: Kunihiko Hayashi 
Cc: Masahiro Yamada 
Cc: linux-samsung-...@vger.kernel.org
Cc: linux-amlo...@lists.infradead.org
Cc: linux-arm-ker...@axis.com
Cc: linux-arm-...@vger.kernel.org
Cc: linux-te...@vger.kernel.org
Signed-off-by: Rob Herring 
---
 drivers/pci/controller/dwc/pci-dra7xx.c   |  3 +++
 drivers/pci/controller/dwc/pci-exynos.c   |  6 -
 drivers/pci/controller/dwc/pci-imx6.c |  6 -
 drivers/pci/controller/dwc/pci-meson.c|  6 -
 drivers/pci/controller/dwc/pcie-artpec6.c |  6 -
 .../pci/controller/dwc/pcie-designware-host.c | 11 +-
 .../pci/controller/dwc/pcie-designware-plat.c |  6 -
 drivers/pci/controller/dwc/pcie-histb.c   |  6 -
 drivers/pci/controller/dwc/pcie-kirin.c   | 22 ---
 drivers/pci/controller/dwc/pcie-qcom.c|  8 ---
 drivers/pci/controller/dwc/pcie-spear13xx.c   |  1 +
 drivers/pci/controller/dwc/pcie-tegra194.c|  8 ---
 drivers/pci/controller/dwc/pcie-uniphier.c|  6 -
 13 files changed, 14 insertions(+), 81 deletions(-)

diff --git a/drivers/pci/controller/dwc/pci-dra7xx.c 
b/drivers/pci/controller/dwc/pci-dra7xx.c
index 4d0c35a4aa59..6b75c68dddb5 100644
--- a/drivers/pci/controller/dwc/pci-dra7xx.c
+++ b/drivers/pci/controller/dwc/pci-dra7xx.c
@@ -489,6 +489,9 @@ static int __init dra7xx_add_pcie_port(struct dra7xx_pcie 
*dra7xx,
if (pp->irq < 0)
return pp->irq;
 
+   /* MSI IRQ is muxed */
+   pp->msi_irq = -ENODEV;
+
ret = dra7xx_pcie_init_irq_domain(pp);
if (ret < 0)
return ret;
diff --git a/drivers/pci/controller/dwc/pci-exynos.c 
b/drivers/pci/controller/dwc/pci-exynos.c
index 242683cde04a..7734394953e5 100644
--- a/drivers/pci/controller/dwc/pci-exynos.c
+++ b/drivers/pci/controller/dwc/pci-exynos.c
@@ -415,12 +415,6 @@ static int __init exynos_add_pcie_port(struct exynos_pcie 
*ep,
return ret;
}
 
-   if (IS_ENABLED(CONFIG_PCI_MSI)) {
-   pp->msi_irq = platform_get_irq(pdev, 0);
-   if (pp->msi_irq < 0)
-   return pp->msi_irq;
-   }
-
pp->ops = _pcie_host_ops;
 
ret = dw_pcie_host_init(pp);
diff --git a/drivers/pci/controller/dwc/pci-imx6.c 
b/drivers/pci/controller/dwc/pci-imx6.c
index 7dd137d62dca..1b979c956fcd 100644
--- a/drivers/pci/controller/dwc/pci-imx6.c
+++ b/drivers/pci/controller/dwc/pci-imx6.c
@@ -853,12 +853,6 @@ static int imx6_add_pcie_port(struct imx6_pcie *imx6_pcie,
struct device *dev = >dev;
int ret;
 
-   if (IS_ENABLED(CONFIG_PCI_MSI)) {
-   pp->msi_irq = platform_get_irq_byname(pdev, "msi");
-   if (pp->msi_irq < 0)
-   return pp->msi_irq;
-   }
-
pp->ops = _pcie_host_ops;
 
ret = dw_pcie_host_init(pp);
diff --git a/drivers/pci/controller/dwc/pci-meson.c 
b/drivers/pci/controller/dwc/pci-meson.c
index 1913dc2c8fa0..10d65b3093e4 100644
--- a/drivers/pci/controller/dwc/pci-meson.c
+++ b/drivers/pci/controller/dwc/pci-meson.c
@@ -405,12 +405,6 @@ static int meson_add_pcie_port(struct meson_pcie *mp,
struct device *dev = >dev;
int ret;
 
-   if (IS_ENABLED(CONFIG_PCI_MSI)) {
-   pp->msi_irq = platform_get_irq(pdev, 0);
-   if (pp->msi_irq < 0)
-   return pp->msi_irq;
-   }
-
pp->ops = _pcie_host_ops;
 
ret = dw_pcie_host_init(pp);
diff --git a/drivers/pci/controller/dwc/pcie-artpec6.c 
b/drivers/pci/controller/dwc/pcie-artpec6.c
index 52ad7909cd0c..a5239a58cee0 100644
--- a/drivers/pci/controller/dwc/pcie-artpec6.c
+++ b/drivers/pci/controller/dwc/pcie-artpec6.c
@@ -348,12 +348,6 @@ static int artpec6_add_pcie_port(struct artpec6_pcie 
*artpec6_pcie,
struct device *dev = pci->dev;
int ret;
 
-   if (IS_ENABLED(CONFIG_PCI_MSI)) {
-   pp->msi_irq = platform_get_irq_byname(pdev, "msi");
-   if (pp->msi_irq < 0)
-   return pp->msi_irq;
-   }
-
pp->ops = _pcie_host_ops;
 
ret = dw_pcie_host_init(pp);
diff --git 

[PATCH 07/13] PCI: dwc: Drop the .set_num_vectors() host op

2020-10-28 Thread Rob Herring
There's no reason for the .set_num_vectors() host op. Drivers needing a
non-default value can just initialize pcie_port.num_vectors directly.

Cc: Jingoo Han 
Cc: Gustavo Pimentel 
Cc: Lorenzo Pieralisi 
Cc: Bjorn Helgaas 
Cc: Thierry Reding 
Cc: Jonathan Hunter 
Cc: linux-te...@vger.kernel.org
Signed-off-by: Rob Herring 
---
 .../pci/controller/dwc/pcie-designware-host.c | 19 ---
 .../pci/controller/dwc/pcie-designware-plat.c |  7 +--
 drivers/pci/controller/dwc/pcie-designware.h  |  1 -
 drivers/pci/controller/dwc/pcie-tegra194.c|  7 +--
 4 files changed, 6 insertions(+), 28 deletions(-)

diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c 
b/drivers/pci/controller/dwc/pcie-designware-host.c
index 265a48f1a0ae..1bd6a9762426 100644
--- a/drivers/pci/controller/dwc/pcie-designware-host.c
+++ b/drivers/pci/controller/dwc/pcie-designware-host.c
@@ -365,22 +365,11 @@ int dw_pcie_host_init(struct pcie_port *pp)
pci->link_gen = of_pci_get_max_link_speed(np);
 
if (pci_msi_enabled()) {
-   /*
-* If a specific SoC driver needs to change the
-* default number of vectors, it needs to implement
-* the set_num_vectors callback.
-*/
-   if (!pp->ops->set_num_vectors) {
+   if (!pp->num_vectors) {
pp->num_vectors = MSI_DEF_NUM_VECTORS;
-   } else {
-   pp->ops->set_num_vectors(pp);
-
-   if (pp->num_vectors > MAX_MSI_IRQS ||
-   pp->num_vectors == 0) {
-   dev_err(dev,
-   "Invalid number of vectors\n");
-   return -EINVAL;
-   }
+   } else if (pp->num_vectors > MAX_MSI_IRQS) {
+   dev_err(dev, "Invalid number of vectors\n");
+   return -EINVAL;
}
 
if (!pp->ops->msi_host_init) {
diff --git a/drivers/pci/controller/dwc/pcie-designware-plat.c 
b/drivers/pci/controller/dwc/pcie-designware-plat.c
index 562a05e07b1d..13fede1d4157 100644
--- a/drivers/pci/controller/dwc/pcie-designware-plat.c
+++ b/drivers/pci/controller/dwc/pcie-designware-plat.c
@@ -44,14 +44,8 @@ static int dw_plat_pcie_host_init(struct pcie_port *pp)
return 0;
 }
 
-static void dw_plat_set_num_vectors(struct pcie_port *pp)
-{
-   pp->num_vectors = MAX_MSI_IRQS;
-}
-
 static const struct dw_pcie_host_ops dw_plat_pcie_host_ops = {
.host_init = dw_plat_pcie_host_init,
-   .set_num_vectors = dw_plat_set_num_vectors,
 };
 
 static int dw_plat_pcie_establish_link(struct dw_pcie *pci)
@@ -128,6 +122,7 @@ static int dw_plat_add_pcie_port(struct dw_plat_pcie 
*dw_plat_pcie,
return pp->msi_irq;
}
 
+   pp->num_vectors = MAX_MSI_IRQS;
pp->ops = _plat_pcie_host_ops;
 
ret = dw_pcie_host_init(pp);
diff --git a/drivers/pci/controller/dwc/pcie-designware.h 
b/drivers/pci/controller/dwc/pcie-designware.h
index ed19c34dd0fe..96382dcb2859 100644
--- a/drivers/pci/controller/dwc/pcie-designware.h
+++ b/drivers/pci/controller/dwc/pcie-designware.h
@@ -171,7 +171,6 @@ enum dw_pcie_device_mode {
 
 struct dw_pcie_host_ops {
int (*host_init)(struct pcie_port *pp);
-   void (*set_num_vectors)(struct pcie_port *pp);
int (*msi_host_init)(struct pcie_port *pp);
 };
 
diff --git a/drivers/pci/controller/dwc/pcie-tegra194.c 
b/drivers/pci/controller/dwc/pcie-tegra194.c
index f8fca6794282..5e2841f58700 100644
--- a/drivers/pci/controller/dwc/pcie-tegra194.c
+++ b/drivers/pci/controller/dwc/pcie-tegra194.c
@@ -990,11 +990,6 @@ static int tegra_pcie_dw_link_up(struct dw_pcie *pci)
return !!(val & PCI_EXP_LNKSTA_DLLLA);
 }
 
-static void tegra_pcie_set_msi_vec_num(struct pcie_port *pp)
-{
-   pp->num_vectors = MAX_MSI_IRQS;
-}
-
 static int tegra_pcie_dw_start_link(struct dw_pcie *pci)
 {
struct tegra_pcie_dw *pcie = to_tegra_pcie(pci);
@@ -1019,7 +1014,6 @@ static const struct dw_pcie_ops tegra_dw_pcie_ops = {
 
 static struct dw_pcie_host_ops tegra_pcie_dw_host_ops = {
.host_init = tegra_pcie_dw_host_init,
-   .set_num_vectors = tegra_pcie_set_msi_vec_num,
 };
 
 static void tegra_pcie_disable_phy(struct tegra_pcie_dw *pcie)
@@ -1995,6 +1989,7 @@ static int tegra_pcie_dw_probe(struct platform_device 
*pdev)
pci->n_fts[1] = FTS_VAL;
 
pp = >pp;
+   pp->num_vectors = MAX_MSI_IRQS;
pcie->dev = >dev;
pcie->mode = (enum dw_pcie_device_mode)data->mode;
 
-- 
2.25.1



[PATCH 06/13] PCI: dwc/dra7xx: Use the common MSI irq_chip

2020-10-28 Thread Rob Herring
The dra7xx MSI irq_chip implementation is identical to the default DWC one.
The only difference is the interrupt handler as the MSI interrupt is muxed
with other interrupts, but that doesn't affect the irq_chip part of it.

Cc: Kishon Vijay Abraham I 
Cc: Lorenzo Pieralisi 
Cc: Bjorn Helgaas 
Cc: linux-o...@vger.kernel.org
Signed-off-by: Rob Herring 
---
 drivers/pci/controller/dwc/pci-dra7xx.c | 125 
 1 file changed, 125 deletions(-)

diff --git a/drivers/pci/controller/dwc/pci-dra7xx.c 
b/drivers/pci/controller/dwc/pci-dra7xx.c
index a4aabc85dbb1..4d0c35a4aa59 100644
--- a/drivers/pci/controller/dwc/pci-dra7xx.c
+++ b/drivers/pci/controller/dwc/pci-dra7xx.c
@@ -377,133 +377,8 @@ static int dra7xx_pcie_init_irq_domain(struct pcie_port 
*pp)
return 0;
 }
 
-static void dra7xx_pcie_setup_msi_msg(struct irq_data *d, struct msi_msg *msg)
-{
-   struct pcie_port *pp = irq_data_get_irq_chip_data(d);
-   struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
-   u64 msi_target;
-
-   msi_target = (u64)pp->msi_data;
-
-   msg->address_lo = lower_32_bits(msi_target);
-   msg->address_hi = upper_32_bits(msi_target);
-
-   msg->data = d->hwirq;
-
-   dev_dbg(pci->dev, "msi#%d address_hi %#x address_lo %#x\n",
-   (int)d->hwirq, msg->address_hi, msg->address_lo);
-}
-
-static int dra7xx_pcie_msi_set_affinity(struct irq_data *d,
-   const struct cpumask *mask,
-   bool force)
-{
-   return -EINVAL;
-}
-
-static void dra7xx_pcie_bottom_mask(struct irq_data *d)
-{
-   struct pcie_port *pp = irq_data_get_irq_chip_data(d);
-   struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
-   unsigned int res, bit, ctrl;
-   unsigned long flags;
-
-   raw_spin_lock_irqsave(>lock, flags);
-
-   ctrl = d->hwirq / MAX_MSI_IRQS_PER_CTRL;
-   res = ctrl * MSI_REG_CTRL_BLOCK_SIZE;
-   bit = d->hwirq % MAX_MSI_IRQS_PER_CTRL;
-
-   pp->irq_mask[ctrl] |= BIT(bit);
-   dw_pcie_writel_dbi(pci, PCIE_MSI_INTR0_MASK + res,
-  pp->irq_mask[ctrl]);
-
-   raw_spin_unlock_irqrestore(>lock, flags);
-}
-
-static void dra7xx_pcie_bottom_unmask(struct irq_data *d)
-{
-   struct pcie_port *pp = irq_data_get_irq_chip_data(d);
-   struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
-   unsigned int res, bit, ctrl;
-   unsigned long flags;
-
-   raw_spin_lock_irqsave(>lock, flags);
-
-   ctrl = d->hwirq / MAX_MSI_IRQS_PER_CTRL;
-   res = ctrl * MSI_REG_CTRL_BLOCK_SIZE;
-   bit = d->hwirq % MAX_MSI_IRQS_PER_CTRL;
-
-   pp->irq_mask[ctrl] &= ~BIT(bit);
-   dw_pcie_writel_dbi(pci, PCIE_MSI_INTR0_MASK + res,
-  pp->irq_mask[ctrl]);
-
-   raw_spin_unlock_irqrestore(>lock, flags);
-}
-
-static void dra7xx_pcie_bottom_ack(struct irq_data *d)
-{
-   struct pcie_port *pp  = irq_data_get_irq_chip_data(d);
-   struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
-   unsigned int res, bit, ctrl;
-
-   ctrl = d->hwirq / MAX_MSI_IRQS_PER_CTRL;
-   res = ctrl * MSI_REG_CTRL_BLOCK_SIZE;
-   bit = d->hwirq % MAX_MSI_IRQS_PER_CTRL;
-
-   dw_pcie_writel_dbi(pci, PCIE_MSI_INTR0_STATUS + res, BIT(bit));
-}
-
-static struct irq_chip dra7xx_pci_msi_bottom_irq_chip = {
-   .name = "DRA7XX-PCI-MSI",
-   .irq_ack = dra7xx_pcie_bottom_ack,
-   .irq_compose_msi_msg = dra7xx_pcie_setup_msi_msg,
-   .irq_set_affinity = dra7xx_pcie_msi_set_affinity,
-   .irq_mask = dra7xx_pcie_bottom_mask,
-   .irq_unmask = dra7xx_pcie_bottom_unmask,
-};
-
-static int dra7xx_pcie_msi_host_init(struct pcie_port *pp)
-{
-   struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
-   struct device *dev = pci->dev;
-   u32 ctrl, num_ctrls;
-   int ret;
-
-   pp->msi_irq_chip = _pci_msi_bottom_irq_chip;
-
-   num_ctrls = pp->num_vectors / MAX_MSI_IRQS_PER_CTRL;
-   /* Initialize IRQ Status array */
-   for (ctrl = 0; ctrl < num_ctrls; ctrl++) {
-   pp->irq_mask[ctrl] = ~0;
-   dw_pcie_writel_dbi(pci, PCIE_MSI_INTR0_MASK +
-   (ctrl * MSI_REG_CTRL_BLOCK_SIZE),
-   pp->irq_mask[ctrl]);
-   dw_pcie_writel_dbi(pci, PCIE_MSI_INTR0_ENABLE +
-   (ctrl * MSI_REG_CTRL_BLOCK_SIZE),
-   ~0);
-   }
-
-   ret = dw_pcie_allocate_domains(pp);
-   if (ret)
-   return ret;
-
-   pp->msi_data = dma_map_single_attrs(dev, >msi_msg,
-  sizeof(pp->msi_msg),
-  DMA_FROM_DEVICE,
-  DMA_ATTR_SKIP_CPU_SYNC);
-   ret = dma_mapping_error(dev, pp->msi_data);
-   if (ret) {
-   dev_err(dev, "Failed to map MSI data\n");
-   pp->msi_data = 0;
-   

[PATCH 05/13] PCI: dwc: Ensure all outbound ATU windows are reset

2020-10-28 Thread Rob Herring
The Layerscape driver clears the ATU registers which may have been
configured by the bootloader. Any driver could have the same issue
and doing it for all drivers doesn't hurt, so let's move it into the
common DWC code.

Cc: Minghuan Lian 
Cc: Mingkai Hu 
Cc: Roy Zang 
Cc: Lorenzo Pieralisi 
Cc: Bjorn Helgaas 
Cc: Jingoo Han 
Cc: Gustavo Pimentel 
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Rob Herring 
---
 drivers/pci/controller/dwc/pci-layerscape.c   | 14 --
 drivers/pci/controller/dwc/pcie-designware-host.c |  5 +
 2 files changed, 5 insertions(+), 14 deletions(-)

diff --git a/drivers/pci/controller/dwc/pci-layerscape.c 
b/drivers/pci/controller/dwc/pci-layerscape.c
index f24f79a70d9a..53e56d54c482 100644
--- a/drivers/pci/controller/dwc/pci-layerscape.c
+++ b/drivers/pci/controller/dwc/pci-layerscape.c
@@ -83,14 +83,6 @@ static void ls_pcie_drop_msg_tlp(struct ls_pcie *pcie)
iowrite32(val, pci->dbi_base + PCIE_STRFMR1);
 }
 
-static void ls_pcie_disable_outbound_atus(struct ls_pcie *pcie)
-{
-   int i;
-
-   for (i = 0; i < PCIE_IATU_NUM; i++)
-   dw_pcie_disable_atu(pcie->pci, i, DW_PCIE_REGION_OUTBOUND);
-}
-
 static int ls1021_pcie_link_up(struct dw_pcie *pci)
 {
u32 state;
@@ -136,12 +128,6 @@ static int ls_pcie_host_init(struct pcie_port *pp)
struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
struct ls_pcie *pcie = to_ls_pcie(pci);
 
-   /*
-* Disable outbound windows configured by the bootloader to avoid
-* one transaction hitting multiple outbound windows.
-* dw_pcie_setup_rc() will reconfigure the outbound windows.
-*/
-   ls_pcie_disable_outbound_atus(pcie);
ls_pcie_fix_error_response(pcie);
 
dw_pcie_dbi_ro_wr_en(pci);
diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c 
b/drivers/pci/controller/dwc/pcie-designware-host.c
index cde45b2076ee..265a48f1a0ae 100644
--- a/drivers/pci/controller/dwc/pcie-designware-host.c
+++ b/drivers/pci/controller/dwc/pcie-designware-host.c
@@ -534,6 +534,7 @@ static struct pci_ops dw_pcie_ops = {
 
 void dw_pcie_setup_rc(struct pcie_port *pp)
 {
+   int i;
u32 val, ctrl, num_ctrls;
struct dw_pcie *pci = to_dw_pcie_from_pp(pp);
 
@@ -583,6 +584,10 @@ void dw_pcie_setup_rc(struct pcie_port *pp)
PCI_COMMAND_MASTER | PCI_COMMAND_SERR;
dw_pcie_writel_dbi(pci, PCI_COMMAND, val);
 
+   /* Ensure all outbound windows are disabled so there are multiple 
matches */
+   for (i = 0; i < pci->num_viewport; i++)
+   dw_pcie_disable_atu(pci, i, DW_PCIE_REGION_OUTBOUND);
+
/*
 * If the platform provides its own child bus config accesses, it means
 * the platform uses its own address translation component rather than
-- 
2.25.1



[PATCH 04/13] PCI: dwc/intel-gw: Remove some unneeded function wrappers

2020-10-28 Thread Rob Herring
Remove some of the pointless levels of functions that just wrap or group
a series of other functions.

Cc: Lorenzo Pieralisi 
Cc: Bjorn Helgaas 
Signed-off-by: Rob Herring 
---
 drivers/pci/controller/dwc/pcie-intel-gw.c | 47 --
 1 file changed, 16 insertions(+), 31 deletions(-)

diff --git a/drivers/pci/controller/dwc/pcie-intel-gw.c 
b/drivers/pci/controller/dwc/pcie-intel-gw.c
index 88782653ed21..c562eb7454b1 100644
--- a/drivers/pci/controller/dwc/pcie-intel-gw.c
+++ b/drivers/pci/controller/dwc/pcie-intel-gw.c
@@ -152,19 +152,6 @@ static void intel_pcie_init_n_fts(struct dw_pcie *pci)
pci->n_fts[0] = PORT_AFR_N_FTS_GEN12_DFT;
 }
 
-static void intel_pcie_rc_setup(struct intel_pcie_port *lpp)
-{
-   struct dw_pcie *pci = >pci;
-
-   pci->atu_base = pci->dbi_base + 0xC;
-
-   intel_pcie_ltssm_disable(lpp);
-   intel_pcie_link_setup(lpp);
-   intel_pcie_init_n_fts(pci);
-   dw_pcie_setup_rc(>pp);
-   dw_pcie_upconfig_setup(pci);
-}
-
 static int intel_pcie_ep_rst_init(struct intel_pcie_port *lpp)
 {
struct device *dev = lpp->pci.dev;
@@ -216,14 +203,6 @@ static void intel_pcie_device_rst_deassert(struct 
intel_pcie_port *lpp)
gpiod_set_value_cansleep(lpp->reset_gpio, 0);
 }
 
-static int intel_pcie_app_logic_setup(struct intel_pcie_port *lpp)
-{
-   intel_pcie_device_rst_deassert(lpp);
-   intel_pcie_ltssm_enable(lpp);
-
-   return dw_pcie_wait_for_link(>pci);
-}
-
 static void intel_pcie_core_irq_disable(struct intel_pcie_port *lpp)
 {
pcie_app_wr(lpp, PCIE_APP_IRNEN, 0);
@@ -273,11 +252,6 @@ static int intel_pcie_get_resources(struct platform_device 
*pdev)
return 0;
 }
 
-static void intel_pcie_deinit_phy(struct intel_pcie_port *lpp)
-{
-   phy_exit(lpp->phy);
-}
-
 static int intel_pcie_wait_l2(struct intel_pcie_port *lpp)
 {
u32 value;
@@ -314,6 +288,7 @@ static void intel_pcie_turn_off(struct intel_pcie_port *lpp)
 static int intel_pcie_host_setup(struct intel_pcie_port *lpp)
 {
int ret;
+   struct dw_pcie *pci = >pci;
 
intel_pcie_core_rst_assert(lpp);
intel_pcie_device_rst_assert(lpp);
@@ -330,8 +305,18 @@ static int intel_pcie_host_setup(struct intel_pcie_port 
*lpp)
goto clk_err;
}
 
-   intel_pcie_rc_setup(lpp);
-   ret = intel_pcie_app_logic_setup(lpp);
+   pci->atu_base = pci->dbi_base + 0xC;
+
+   intel_pcie_ltssm_disable(lpp);
+   intel_pcie_link_setup(lpp);
+   intel_pcie_init_n_fts(pci);
+   dw_pcie_setup_rc(>pp);
+   dw_pcie_upconfig_setup(pci);
+
+   intel_pcie_device_rst_deassert(lpp);
+   intel_pcie_ltssm_enable(lpp);
+
+   ret = dw_pcie_wait_for_link(pci);
if (ret)
goto app_init_err;
 
@@ -345,7 +330,7 @@ static int intel_pcie_host_setup(struct intel_pcie_port 
*lpp)
clk_disable_unprepare(lpp->core_clk);
 clk_err:
intel_pcie_core_rst_assert(lpp);
-   intel_pcie_deinit_phy(lpp);
+   phy_exit(lpp->phy);
 
return ret;
 }
@@ -356,7 +341,7 @@ static void __intel_pcie_remove(struct intel_pcie_port *lpp)
intel_pcie_turn_off(lpp);
clk_disable_unprepare(lpp->core_clk);
intel_pcie_core_rst_assert(lpp);
-   intel_pcie_deinit_phy(lpp);
+   phy_exit(lpp->phy);
 }
 
 static int intel_pcie_remove(struct platform_device *pdev)
@@ -380,7 +365,7 @@ static int __maybe_unused intel_pcie_suspend_noirq(struct 
device *dev)
if (ret)
return ret;
 
-   intel_pcie_deinit_phy(lpp);
+   phy_exit(lpp->phy);
clk_disable_unprepare(lpp->core_clk);
return ret;
 }
-- 
2.25.1



[PATCH 03/13] PCI: dwc: Move "dbi", "dbi2", and "addr_space" resource setup into common code

2020-10-28 Thread Rob Herring
Most DWC drivers use the common register resource names "dbi", "dbi2", and
"addr_space", so let's move their setup into the DWC common code.

This means 'dbi_base' in particular is setup later, but it looks like no
drivers touch DBI registers before dw_pcie_host_init or dw_pcie_ep_init.

Cc: Kishon Vijay Abraham I 
Cc: Lorenzo Pieralisi 
Cc: Bjorn Helgaas 
Cc: Murali Karicheri 
Cc: Minghuan Lian 
Cc: Mingkai Hu 
Cc: Roy Zang 
Cc: Jonathan Chocron 
Cc: Jesper Nilsson 
Cc: Jingoo Han 
Cc: Gustavo Pimentel 
Cc: Xiaowei Song 
Cc: Binghui Wang 
Cc: Andy Gross 
Cc: Bjorn Andersson 
Cc: Stanimir Varbanov 
Cc: Pratyush Anand 
Cc: Thierry Reding 
Cc: Jonathan Hunter 
Cc: Kunihiko Hayashi 
Cc: Masahiro Yamada 
Cc: linux-o...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-arm-ker...@axis.com
Cc: linux-arm-...@vger.kernel.org
Cc: linux-te...@vger.kernel.org
Signed-off-by: Rob Herring 
---
 drivers/pci/controller/dwc/pci-dra7xx.c   |  8 
 drivers/pci/controller/dwc/pci-keystone.c | 29 +---
 .../pci/controller/dwc/pci-layerscape-ep.c| 37 +--
 drivers/pci/controller/dwc/pcie-al.c  |  9 +---
 drivers/pci/controller/dwc/pcie-artpec6.c | 43 ++
 .../pci/controller/dwc/pcie-designware-ep.c   | 29 ++--
 .../pci/controller/dwc/pcie-designware-host.c |  7 +++
 .../pci/controller/dwc/pcie-designware-plat.c | 45 +--
 drivers/pci/controller/dwc/pcie-intel-gw.c|  4 --
 drivers/pci/controller/dwc/pcie-kirin.c   |  5 ---
 drivers/pci/controller/dwc/pcie-qcom.c|  8 
 drivers/pci/controller/dwc/pcie-spear13xx.c   | 11 +
 drivers/pci/controller/dwc/pcie-tegra194.c| 22 -
 drivers/pci/controller/dwc/pcie-uniphier-ep.c | 38 +---
 drivers/pci/controller/dwc/pcie-uniphier.c|  6 ---
 15 files changed, 47 insertions(+), 254 deletions(-)

diff --git a/drivers/pci/controller/dwc/pci-dra7xx.c 
b/drivers/pci/controller/dwc/pci-dra7xx.c
index 6d012d2b1e90..a4aabc85dbb1 100644
--- a/drivers/pci/controller/dwc/pci-dra7xx.c
+++ b/drivers/pci/controller/dwc/pci-dra7xx.c
@@ -578,7 +578,6 @@ static int __init dra7xx_add_pcie_ep(struct dra7xx_pcie 
*dra7xx,
 {
int ret;
struct dw_pcie_ep *ep;
-   struct resource *res;
struct device *dev = >dev;
struct dw_pcie *pci = dra7xx->pci;
 
@@ -594,13 +593,6 @@ static int __init dra7xx_add_pcie_ep(struct dra7xx_pcie 
*dra7xx,
if (IS_ERR(pci->dbi_base2))
return PTR_ERR(pci->dbi_base2);
 
-   res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "addr_space");
-   if (!res)
-   return -EINVAL;
-
-   ep->phys_base = res->start;
-   ep->addr_size = resource_size(res);
-
ret = dw_pcie_ep_init(ep);
if (ret) {
dev_err(dev, "failed to initialize endpoint\n");
diff --git a/drivers/pci/controller/dwc/pci-keystone.c 
b/drivers/pci/controller/dwc/pci-keystone.c
index a222728238ca..9cf14f13798b 100644
--- a/drivers/pci/controller/dwc/pci-keystone.c
+++ b/drivers/pci/controller/dwc/pci-keystone.c
@@ -977,33 +977,6 @@ static const struct dw_pcie_ep_ops ks_pcie_am654_ep_ops = {
.get_features = _pcie_am654_get_features,
 };
 
-static int __init ks_pcie_add_pcie_ep(struct keystone_pcie *ks_pcie,
- struct platform_device *pdev)
-{
-   int ret;
-   struct dw_pcie_ep *ep;
-   struct resource *res;
-   struct device *dev = >dev;
-   struct dw_pcie *pci = ks_pcie->pci;
-
-   ep = >ep;
-
-   res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "addr_space");
-   if (!res)
-   return -EINVAL;
-
-   ep->phys_base = res->start;
-   ep->addr_size = resource_size(res);
-
-   ret = dw_pcie_ep_init(ep);
-   if (ret) {
-   dev_err(dev, "failed to initialize endpoint\n");
-   return ret;
-   }
-
-   return 0;
-}
-
 static void ks_pcie_disable_phy(struct keystone_pcie *ks_pcie)
 {
int num_lanes = ks_pcie->num_lanes;
@@ -1313,7 +1286,7 @@ static int __init ks_pcie_probe(struct platform_device 
*pdev)
}
 
pci->ep.ops = ep_ops;
-   ret = ks_pcie_add_pcie_ep(ks_pcie, pdev);
+   ret = dw_pcie_ep_init(>ep);
if (ret < 0)
goto err_get_sync;
break;
diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c 
b/drivers/pci/controller/dwc/pci-layerscape-ep.c
index 84206f265e54..4af031b3f0a0 100644
--- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
+++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
@@ -18,8 +18,6 @@
 
 #include "pcie-designware.h"
 
-#define PCIE_DBI2_OFFSET   0x1000  /* DBI2 base address*/
-
 #define to_ls_pcie_ep(x)   dev_get_drvdata((x)->dev)
 
 struct ls_pcie_ep_drvdata {
@@ -124,34 +122,6 @@ static const struct of_device_id ls_pcie_ep_of_match[] = {
{ },
 };
 
-static int __init 

[PATCH 02/13] PCI: dwc/intel-gw: Move ATU offset out of driver match data

2020-10-28 Thread Rob Herring
The ATU offset should be a register range in DT called 'atu', not driver
match data. Any future platforms with a different ATU offset should add
it to their DT.

This is also in preparation to do DBI resource setup in the core DWC
code, so let's move setting atu_base later in intel_pcie_rc_setup().

Cc: Lorenzo Pieralisi 
Cc: Bjorn Helgaas 
Signed-off-by: Rob Herring 
---
 drivers/pci/controller/dwc/pcie-intel-gw.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/controller/dwc/pcie-intel-gw.c 
b/drivers/pci/controller/dwc/pcie-intel-gw.c
index 5650cb78acba..77ef88333115 100644
--- a/drivers/pci/controller/dwc/pcie-intel-gw.c
+++ b/drivers/pci/controller/dwc/pcie-intel-gw.c
@@ -58,7 +58,6 @@
 
 struct intel_pcie_soc {
unsigned intpcie_ver;
-   unsigned intpcie_atu_offset;
u32 num_viewport;
 };
 
@@ -155,11 +154,15 @@ static void intel_pcie_init_n_fts(struct dw_pcie *pci)
 
 static void intel_pcie_rc_setup(struct intel_pcie_port *lpp)
 {
+   struct dw_pcie *pci = >pci;
+
+   pci->atu_base = pci->dbi_base + 0xC;
+
intel_pcie_ltssm_disable(lpp);
intel_pcie_link_setup(lpp);
-   intel_pcie_init_n_fts(>pci);
-   dw_pcie_setup_rc(>pci.pp);
-   dw_pcie_upconfig_setup(>pci);
+   intel_pcie_init_n_fts(pci);
+   dw_pcie_setup_rc(>pp);
+   dw_pcie_upconfig_setup(pci);
 }
 
 static int intel_pcie_ep_rst_init(struct intel_pcie_port *lpp)
@@ -425,7 +428,6 @@ static const struct dw_pcie_host_ops intel_pcie_dw_ops = {
 
 static const struct intel_pcie_soc pcie_data = {
.pcie_ver = 0x520A,
-   .pcie_atu_offset =  0xC,
.num_viewport = 3,
 };
 
@@ -461,7 +463,6 @@ static int intel_pcie_probe(struct platform_device *pdev)
 
pci->ops = _pcie_ops;
pci->version = data->pcie_ver;
-   pci->atu_base = pci->dbi_base + data->pcie_atu_offset;
pp->ops = _pcie_dw_ops;
 
ret = dw_pcie_host_init(pp);
-- 
2.25.1



[PATCH 01/13] PCI: dwc/imx6: Drop setting PCI_MSI_FLAGS_ENABLE

2020-10-28 Thread Rob Herring
No other host driver sets the PCI_MSI_FLAGS_ENABLE bit, so it must not
be necessary. If it is, a comment is needed.

Cc: Richard Zhu 
Cc: Lucas Stach 
Cc: Lorenzo Pieralisi 
Cc: Bjorn Helgaas 
Cc: Shawn Guo 
Cc: Sascha Hauer 
Cc: Pengutronix Kernel Team 
Cc: Fabio Estevam 
Cc: NXP Linux Team 
Signed-off-by: Rob Herring 
---
 drivers/pci/controller/dwc/pci-imx6.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/drivers/pci/controller/dwc/pci-imx6.c 
b/drivers/pci/controller/dwc/pci-imx6.c
index 5cf1ef12fb9b..7dd137d62dca 100644
--- a/drivers/pci/controller/dwc/pci-imx6.c
+++ b/drivers/pci/controller/dwc/pci-imx6.c
@@ -1002,7 +1002,6 @@ static int imx6_pcie_probe(struct platform_device *pdev)
struct resource *dbi_base;
struct device_node *node = dev->of_node;
int ret;
-   u16 val;
 
imx6_pcie = devm_kzalloc(dev, sizeof(*imx6_pcie), GFP_KERNEL);
if (!imx6_pcie)
@@ -1167,13 +1166,6 @@ static int imx6_pcie_probe(struct platform_device *pdev)
if (ret < 0)
return ret;
 
-   if (pci_msi_enabled()) {
-   u8 offset = dw_pcie_find_capability(pci, PCI_CAP_ID_MSI);
-   val = dw_pcie_readw_dbi(pci, offset + PCI_MSI_FLAGS);
-   val |= PCI_MSI_FLAGS_ENABLE;
-   dw_pcie_writew_dbi(pci, offset + PCI_MSI_FLAGS, val);
-   }
-
return 0;
 }
 
-- 
2.25.1



[PATCH 00/13] PCI: dwc: Another round of clean-ups

2020-10-28 Thread Rob Herring
Here's another batch of DWC PCI host refactoring. This series primarily
moves more of the MSI, link up, and resource handling to the core
code.

No doubt I've probably broken something. Please test. A git branch is
here[1].

Rob

[1] git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git 
pci-more-dwc-cleanup

Rob Herring (13):
  PCI: dwc/imx6: Drop setting PCI_MSI_FLAGS_ENABLE
  PCI: dwc/intel-gw: Move ATU offset out of driver match data
  PCI: dwc: Move "dbi", "dbi2", and "addr_space" resource setup into
common code
  PCI: dwc/intel-gw: Remove some unneeded function wrappers
  PCI: dwc: Ensure all outbound ATU windows are reset
  PCI: dwc/dra7xx: Use the common MSI irq_chip
  PCI: dwc: Drop the .set_num_vectors() host op
  PCI: dwc: Move MSI interrupt setup into DWC common code
  PCI: dwc: Rework MSI initialization
  PCI: dwc: Move link handling into common code
  PCI: dwc: Move dw_pcie_msi_init() into core
  PCI: dwc: Move dw_pcie_setup_rc() to DWC common code
  PCI: dwc: Remove unnecessary wrappers around dw_pcie_host_init()

 drivers/pci/controller/dwc/pci-dra7xx.c   | 141 +-
 drivers/pci/controller/dwc/pci-exynos.c   |  50 ++-
 drivers/pci/controller/dwc/pci-imx6.c |  51 +--
 drivers/pci/controller/dwc/pci-keystone.c |  68 +
 .../pci/controller/dwc/pci-layerscape-ep.c|  37 +
 drivers/pci/controller/dwc/pci-layerscape.c   |  67 +
 drivers/pci/controller/dwc/pci-meson.c|  53 ++-
 drivers/pci/controller/dwc/pcie-al.c  |  29 +---
 drivers/pci/controller/dwc/pcie-armada8k.c|  37 ++---
 drivers/pci/controller/dwc/pcie-artpec6.c |  76 +-
 .../pci/controller/dwc/pcie-designware-ep.c   |  29 +++-
 .../pci/controller/dwc/pcie-designware-host.c |  80 ++
 .../pci/controller/dwc/pcie-designware-plat.c |  70 +
 drivers/pci/controller/dwc/pcie-designware.h  |  12 +-
 drivers/pci/controller/dwc/pcie-histb.c   |  37 ++---
 drivers/pci/controller/dwc/pcie-intel-gw.c|  59 ++--
 drivers/pci/controller/dwc/pcie-kirin.c   |  62 +---
 drivers/pci/controller/dwc/pcie-qcom.c|  38 +
 drivers/pci/controller/dwc/pcie-spear13xx.c   |  62 +++-
 drivers/pci/controller/dwc/pcie-tegra194.c|  40 +
 drivers/pci/controller/dwc/pcie-uniphier-ep.c |  38 +
 drivers/pci/controller/dwc/pcie-uniphier.c|  51 +--
 22 files changed, 217 insertions(+), 970 deletions(-)

--
2.25.1


Re: [PATCH 0/4] arch, mm: improve robustness of direct map manipulation

2020-10-28 Thread Edgecombe, Rick P
On Wed, 2020-10-28 at 13:09 +0200, Mike Rapoport wrote:
> On Tue, Oct 27, 2020 at 09:46:35AM +0100, David Hildenbrand wrote:
> > On 27.10.20 09:38, Mike Rapoport wrote:
> > > On Mon, Oct 26, 2020 at 06:05:30PM +, Edgecombe, Rick P
> > > wrote:
> > > 
> > > > Beyond whatever you are seeing, for the latter case of new
> > > > things
> > > > getting introduced to an interface with hidden dependencies...
> > > > Another
> > > > edge case could be a new caller to set_memory_np() could result
> > > > in
> > > > large NP pages. None of the callers today should cause this
> > > > AFAICT, but
> > > > it's not great to rely on the callers to know these details.
> > > A caller of set_memory_*() or set_direct_map_*() should expect a
> > > failure
> > > and be ready for that. So adding a WARN to safe_copy_page() is
> > > the first
> > > step in that direction :)
> > > 
> > 
> > I am probably missing something important, but why are we
> > saving/restoring
> > the content of pages that were explicitly removed from the identity
> > mapping
> > such that nobody will access them?
> 
> Actually, we should not be saving/restoring free pages during
> hibernation as there are several calls to mark_free_pages() that
> should
> exclude the free pages from the snapshot. I've tried to find why the
> fix
> that maps/unmaps a page to save it was required at the first place,
> but
> I could not find bug reports.
> 
> The closest I've got is an email from Rafael that asked to update
> "hibernate: handle DEBUG_PAGEALLOC" patch:
> 
> https://lore.kernel.org/linux-pm/200802200133.44098@sisk.pl/
> 
> Could it be that safe_copy_page() tries to workaround a non-existent
> problem?

It looks like inside page_alloc.c it unmaps the page before it actually
frees it, so to hibernate it could look like the page is still
allocated even though it's unmapped? Maybe that small window is what it
cared about initially.

There is also now the vmalloc case, which I am actually working on
expanding. So I think the re-mapping logic is needed.


[PATCH] powerpc/smp: Move rcu_cpu_starting() earlier

2020-10-28 Thread Qian Cai
The call to rcu_cpu_starting() in start_secondary() is not early enough
in the CPU-hotplug onlining process, which results in lockdep splats as
follows:

 WARNING: suspicious RCU usage
 -
 kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!!

 other info that might help us debug this:

 RCU used illegally from offline CPU!
 rcu_scheduler_active = 1, debug_locks = 1
 no locks held by swapper/1/0.

 Call Trace:
 dump_stack+0xec/0x144 (unreliable)
 lockdep_rcu_suspicious+0x128/0x14c
 __lock_acquire+0x1060/0x1c60
 lock_acquire+0x140/0x5f0
 _raw_spin_lock_irqsave+0x64/0xb0
 clockevents_register_device+0x74/0x270
 register_decrementer_clockevent+0x94/0x110
 start_secondary+0x134/0x800
 start_secondary_prolog+0x10/0x14

This is avoided by moving the call to rcu_cpu_starting up near the
beginning of the start_secondary() function. Note that the
raw_smp_processor_id() is required in order to avoid calling into
lockdep before RCU has declared the CPU to be watched for readers.

Link: 
https://lore.kernel.org/lkml/160223032121.7002.1269740091547117869.tip-bot2@tip-bot2/
Signed-off-by: Qian Cai 
---
 arch/powerpc/kernel/smp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 3c6b9822f978..8c2857cbd960 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1393,13 +1393,14 @@ static void add_cpu_to_masks(int cpu)
 /* Activate a secondary processor. */
 void start_secondary(void *unused)
 {
-   unsigned int cpu = smp_processor_id();
+   unsigned int cpu = raw_smp_processor_id();
 
mmgrab(_mm);
current->active_mm = _mm;
 
smp_store_cpu_info(cpu);
set_dec(tb_ticks_per_jiffy);
+   rcu_cpu_starting(cpu);
preempt_disable();
cpu_callin_map[cpu] = 1;
 
-- 
2.28.0



Re: [PATCH kernel v3 1/2] dma: Allow mixing bypass and mapped DMA operation

2020-10-28 Thread Christoph Hellwig
On Wed, Oct 28, 2020 at 06:00:29PM +1100, Alexey Kardashevskiy wrote:
> At the moment we allow bypassing DMA ops only when we can do this for
> the entire RAM. However there are configs with mixed type memory
> where we could still allow bypassing IOMMU in most cases;
> POWERPC with persistent memory is one example.
> 
> This adds an arch hook to determine where bypass can still work and
> we invoke direct DMA API. The following patch checks the bus limit
> on POWERPC to allow or disallow direct mapping.
> 
> This adds a CONFIG_ARCH_HAS_DMA_SET_MASK config option to make arch_
> hooks no-op by default.
> 
> Signed-off-by: Alexey Kardashevskiy 
> ---
>  kernel/dma/mapping.c | 24 
>  kernel/dma/Kconfig   |  4 
>  2 files changed, 24 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
> index 51bb8fa8eb89..a0bc9eb876ed 100644
> --- a/kernel/dma/mapping.c
> +++ b/kernel/dma/mapping.c
> @@ -137,6 +137,18 @@ static inline bool dma_map_direct(struct device *dev,
>   return dma_go_direct(dev, *dev->dma_mask, ops);
>  }
>  
> +#ifdef CONFIG_ARCH_HAS_DMA_MAP_DIRECT
> +bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr);
> +bool arch_dma_unmap_page_direct(struct device *dev, dma_addr_t dma_handle);
> +bool arch_dma_map_sg_direct(struct device *dev, struct scatterlist *sg, int 
> nents);
> +bool arch_dma_unmap_sg_direct(struct device *dev, struct scatterlist *sg, 
> int nents);
> +#else
> +#define arch_dma_map_page_direct(d, a) (0)
> +#define arch_dma_unmap_page_direct(d, a) (0)
> +#define arch_dma_map_sg_direct(d, s, n) (0)
> +#define arch_dma_unmap_sg_direct(d, s, n) (0)
> +#endif

A bunch of overly long lines here.  Except for that this looks ok to me.
If you want me to queue up the series I can just fix it up.


Re: [PATCH kernel v2 1/2] dma: Allow mixing bypass and normal IOMMU operation

2020-10-28 Thread Christoph Hellwig
On Wed, Oct 28, 2020 at 05:55:23PM +1100, Alexey Kardashevskiy wrote:
>
> It is passing an address of the end of the mapped area so passing a page 
> struct means passing page and offset which is an extra parameter and we do 
> not want to do anything with the page in those hooks anyway so I'd keep it 
> as is.
>
>
>> and
>> maybe even hide the dma_map_direct inside it.
>
> Call dma_map_direct() from arch_dma_map_page_direct() if 
> arch_dma_map_page_direct() is defined? Seems suboptimal as it is going to 
> be bypass=true in most cases and we save one call by avoiding calling 
> arch_dma_map_page_direct(). Unless I missed something?

C does not even evaluate the right hand side of a || expression if the
left hand evaluates to true.


[PATCH] powerpc/eeh_cache: Fix a possible debugfs deadlock

2020-10-28 Thread Qian Cai
Lockdep complains that a possible deadlock below in
eeh_addr_cache_show() because it is acquiring a lock with IRQ enabled,
but eeh_addr_cache_insert_dev() needs to acquire the same lock with IRQ
disabled. Let's just make eeh_addr_cache_show() acquire the lock with
IRQ disabled as well.

CPU0CPU1

   lock(_io_addr_cache_root.piar_lock);
local_irq_disable();
lock(>lock);
lock(_io_addr_cache_root.piar_lock);
   
 lock(>lock);

  *** DEADLOCK ***

  lock_acquire+0x140/0x5f0
  _raw_spin_lock_irqsave+0x64/0xb0
  eeh_addr_cache_insert_dev+0x48/0x390
  eeh_probe_device+0xb8/0x1a0
  pnv_pcibios_bus_add_device+0x3c/0x80
  pcibios_bus_add_device+0x118/0x290
  pci_bus_add_device+0x28/0xe0
  pci_bus_add_devices+0x54/0xb0
  pcibios_init+0xc4/0x124
  do_one_initcall+0xac/0x528
  kernel_init_freeable+0x35c/0x3fc
  kernel_init+0x24/0x148
  ret_from_kernel_thread+0x5c/0x80

  lock_acquire+0x140/0x5f0
  _raw_spin_lock+0x4c/0x70
  eeh_addr_cache_show+0x38/0x110
  seq_read+0x1a0/0x660
  vfs_read+0xc8/0x1f0
  ksys_read+0x74/0x130
  system_call_exception+0xf8/0x1d0
  system_call_common+0xe8/0x218

Fixes: 5ca85ae6318d ("powerpc/eeh_cache: Add a way to dump the EEH address 
cache")
Signed-off-by: Qian Cai 
---
 arch/powerpc/kernel/eeh_cache.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c
index 6b50bf15d8c1..bf3270426d82 100644
--- a/arch/powerpc/kernel/eeh_cache.c
+++ b/arch/powerpc/kernel/eeh_cache.c
@@ -264,8 +264,9 @@ static int eeh_addr_cache_show(struct seq_file *s, void *v)
 {
struct pci_io_addr_range *piar;
struct rb_node *n;
+   unsigned long flags;
 
-   spin_lock(_io_addr_cache_root.piar_lock);
+   spin_lock_irqsave(_io_addr_cache_root.piar_lock, flags);
for (n = rb_first(_io_addr_cache_root.rb_root); n; n = rb_next(n)) {
piar = rb_entry(n, struct pci_io_addr_range, rb_node);
 
@@ -273,7 +274,7 @@ static int eeh_addr_cache_show(struct seq_file *s, void *v)
   (piar->flags & IORESOURCE_IO) ? "i/o" : "mem",
   >addr_lo, >addr_hi, pci_name(piar->pcidev));
}
-   spin_unlock(_io_addr_cache_root.piar_lock);
+   spin_unlock_irqrestore(_io_addr_cache_root.piar_lock, flags);
 
return 0;
 }
-- 
2.28.0



[PATCH 19/33] docs: ABI: stable: make files ReST compatible

2020-10-28 Thread Mauro Carvalho Chehab
From: Mauro Carvalho Chehab 

Several entries at the stable ABI files won't parse if we pass
them directly to the ReST output.

Adjust them, in order to allow adding their contents as-is at
the stable ABI book.

Signed-off-by: Mauro Carvalho Chehab 
Signed-off-by: Mauro Carvalho Chehab 
---
 Documentation/ABI/stable/firewire-cdev|  4 +
 Documentation/ABI/stable/sysfs-acpi-pmprofile | 22 +++--
 Documentation/ABI/stable/sysfs-bus-firewire   |  3 +
 Documentation/ABI/stable/sysfs-bus-nvmem  | 19 ++--
 Documentation/ABI/stable/sysfs-bus-usb|  6 +-
 .../ABI/stable/sysfs-class-backlight  |  1 +
 .../ABI/stable/sysfs-class-infiniband | 93 +--
 Documentation/ABI/stable/sysfs-class-rfkill   | 13 ++-
 Documentation/ABI/stable/sysfs-class-tpm  | 90 +-
 Documentation/ABI/stable/sysfs-devices|  5 +-
 Documentation/ABI/stable/sysfs-driver-ib_srp  |  1 +
 .../ABI/stable/sysfs-firmware-efi-vars|  4 +
 .../ABI/stable/sysfs-firmware-opal-dump   |  5 +
 .../ABI/stable/sysfs-firmware-opal-elog   |  2 +
 Documentation/ABI/stable/sysfs-hypervisor-xen |  3 +
 Documentation/ABI/stable/vdso |  5 +-
 16 files changed, 176 insertions(+), 100 deletions(-)

diff --git a/Documentation/ABI/stable/firewire-cdev 
b/Documentation/ABI/stable/firewire-cdev
index f72ed653878a..c9e8ff026154 100644
--- a/Documentation/ABI/stable/firewire-cdev
+++ b/Documentation/ABI/stable/firewire-cdev
@@ -14,12 +14,14 @@ Description:
Each /dev/fw* is associated with one IEEE 1394 node, which can
be remote or local nodes.  Operations on a /dev/fw* file have
different scope:
+
  - The 1394 node which is associated with the file:
  - Asynchronous request transmission
  - Get the Configuration ROM
  - Query node ID
  - Query maximum speed of the path between this node
and local node
+
  - The 1394 bus (i.e. "card") to which the node is attached to:
  - Isochronous stream transmission and reception
  - Asynchronous stream transmission and reception
@@ -31,6 +33,7 @@ Description:
manager
  - Query cycle time
  - Bus reset initiation, bus reset event reception
+
  - All 1394 buses:
  - Allocation of IEEE 1212 address ranges on the local
link layers, reception of inbound requests to such
@@ -43,6 +46,7 @@ Description:
userland implement different access permission models, some
operations are restricted to /dev/fw* files that are associated
with a local node:
+
  - Addition of descriptors or directories to the local
nodes' Configuration ROM
  - PHY packet transmission and reception
diff --git a/Documentation/ABI/stable/sysfs-acpi-pmprofile 
b/Documentation/ABI/stable/sysfs-acpi-pmprofile
index 964c7a8afb26..fd97d22b677f 100644
--- a/Documentation/ABI/stable/sysfs-acpi-pmprofile
+++ b/Documentation/ABI/stable/sysfs-acpi-pmprofile
@@ -6,17 +6,21 @@ Description:  The ACPI pm_profile sysfs interface exports the 
platform
power management (and performance) requirement expectations
as provided by BIOS. The integer value is directly passed as
retrieved from the FADT ACPI table.
-Values: For possible values see ACPI specification:
+
+Values:For possible values see ACPI specification:
5.2.9 Fixed ACPI Description Table (FADT)
Field: Preferred_PM_Profile
 
Currently these values are defined by spec:
-   0 Unspecified
-   1 Desktop
-   2 Mobile
-   3 Workstation
-   4 Enterprise Server
-   5 SOHO Server
-   6 Appliance PC
-   7 Performance Server
+
+   == =
+   0  Unspecified
+   1  Desktop
+   2  Mobile
+   3  Workstation
+   4  Enterprise Server
+   5  SOHO Server
+   6  Appliance PC
+   7  Performance Server
>7 Reserved
+   == =
diff --git a/Documentation/ABI/stable/sysfs-bus-firewire 
b/Documentation/ABI/stable/sysfs-bus-firewire
index 41e5a0cd1e3e..9ac9eddb82ef 100644
--- a/Documentation/ABI/stable/sysfs-bus-firewire
+++ b/Documentation/ABI/stable/sysfs-bus-firewire
@@ -47,6 +47,7 @@ Description:
IEEE 1394 node device attribute.
Read-only and immutable.
 Values:1: The sysfs entry represents a local node (a 
controller card).
+
   

[PATCH 3/4] powerpc: Reintroduce is_kvm_guest in a new avatar

2020-10-28 Thread Srikar Dronamraju
Introduce a static branch that would be set during boot if the OS
happens to be a KVM guest. Subsequent checks to see if we are on KVM
will rely on this static branch. This static branch would be used in
vcpu_is_preempted in a subsequent patch.

Signed-off-by: Srikar Dronamraju 
Cc: linuxppc-dev 
Cc: LKML 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Nathan Lynch 
Cc: Gautham R Shenoy 
Cc: Peter Zijlstra 
Cc: Valentin Schneider 
Cc: Juri Lelli 
Cc: Waiman Long 
Cc: Phil Auld 
---
 arch/powerpc/include/asm/kvm_guest.h | 10 ++
 arch/powerpc/include/asm/kvm_para.h  |  2 +-
 arch/powerpc/kernel/firmware.c   |  3 +++
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kvm_guest.h 
b/arch/powerpc/include/asm/kvm_guest.h
index ba8291e02ba9..627ba272e781 100644
--- a/arch/powerpc/include/asm/kvm_guest.h
+++ b/arch/powerpc/include/asm/kvm_guest.h
@@ -7,8 +7,18 @@
 #define __POWERPC_KVM_GUEST_H__
 
 #if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_KVM_GUEST)
+#include 
+
+DECLARE_STATIC_KEY_FALSE(kvm_guest);
+
+static inline bool is_kvm_guest(void)
+{
+   return static_branch_unlikely(_guest);
+}
+
 bool check_kvm_guest(void);
 #else
+static inline bool is_kvm_guest(void) { return false; }
 static inline bool check_kvm_guest(void) { return false; }
 #endif
 
diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index 6fba06b6cfdb..abe1b5e82547 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -14,7 +14,7 @@
 
 static inline int kvm_para_available(void)
 {
-   return IS_ENABLED(CONFIG_KVM_GUEST) && check_kvm_guest();
+   return IS_ENABLED(CONFIG_KVM_GUEST) && is_kvm_guest();
 }
 
 static inline unsigned int kvm_arch_para_features(void)
diff --git a/arch/powerpc/kernel/firmware.c b/arch/powerpc/kernel/firmware.c
index 61243267d4cf..28498fc573f2 100644
--- a/arch/powerpc/kernel/firmware.c
+++ b/arch/powerpc/kernel/firmware.c
@@ -14,6 +14,7 @@
 #include 
 
 #include 
+#include 
 
 #ifdef CONFIG_PPC64
 unsigned long powerpc_firmware_features __read_mostly;
@@ -21,6 +22,7 @@ EXPORT_SYMBOL_GPL(powerpc_firmware_features);
 #endif
 
 #if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_KVM_GUEST)
+DEFINE_STATIC_KEY_FALSE(kvm_guest);
 bool check_kvm_guest(void)
 {
struct device_node *hyper_node;
@@ -32,6 +34,7 @@ bool check_kvm_guest(void)
if (!of_device_is_compatible(hyper_node, "linux,kvm"))
return 0;
 
+   static_branch_enable(_guest);
return 1;
 }
 #endif
-- 
2.18.4



[PATCH 4/4] powerpc/paravirt: Use is_kvm_guest in vcpu_is_preempted

2020-10-28 Thread Srikar Dronamraju
If its a shared lpar but not a KVM guest, then see if the vCPU is
related to the calling vCPU. On PowerVM, only cores can be preempted.
So if one vCPU is a non-preempted state, we can decipher that all other
vCPUs sharing the same core are in non-preempted state.

Signed-off-by: Srikar Dronamraju 
Cc: linuxppc-dev 
Cc: LKML 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Nathan Lynch 
Cc: Gautham R Shenoy 
Cc: Peter Zijlstra 
Cc: Valentin Schneider 
Cc: Juri Lelli 
Cc: Waiman Long 
Cc: Phil Auld 
---
 arch/powerpc/include/asm/paravirt.h | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/arch/powerpc/include/asm/paravirt.h 
b/arch/powerpc/include/asm/paravirt.h
index 9362c94fe3aa..edc08f04aef7 100644
--- a/arch/powerpc/include/asm/paravirt.h
+++ b/arch/powerpc/include/asm/paravirt.h
@@ -10,6 +10,9 @@
 #endif
 
 #ifdef CONFIG_PPC_SPLPAR
+#include 
+#include 
+
 DECLARE_STATIC_KEY_FALSE(shared_processor);
 
 static inline bool is_shared_processor(void)
@@ -74,6 +77,21 @@ static inline bool vcpu_is_preempted(int cpu)
 {
if (!is_shared_processor())
return false;
+
+#ifdef CONFIG_PPC_SPLPAR
+   if (!is_kvm_guest()) {
+   int first_cpu = cpu_first_thread_sibling(smp_processor_id());
+
+   /*
+* Preemption can only happen at core granularity. This CPU
+* is not preempted if one of the CPU of this core is not
+* preempted.
+*/
+   if (cpu_first_thread_sibling(cpu) == first_cpu)
+   return false;
+   }
+#endif
+
if (yield_count_of(cpu) & 1)
return true;
return false;
-- 
2.18.4



[PATCH 1/4] powerpc: Refactor is_kvm_guest declaration to new header

2020-10-28 Thread Srikar Dronamraju
Only code/declaration movement, in anticipation of doing a kvm-aware
vcpu_is_preempted. No additional changes.

Signed-off-by: Srikar Dronamraju 
Cc: linuxppc-dev 
Cc: LKML 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Nathan Lynch 
Cc: Gautham R Shenoy 
Cc: Peter Zijlstra 
Cc: Valentin Schneider 
Cc: Juri Lelli 
Cc: Waiman Long 
Cc: Phil Auld 
---
 arch/powerpc/include/asm/firmware.h  |  6 --
 arch/powerpc/include/asm/kvm_guest.h | 15 +++
 arch/powerpc/include/asm/kvm_para.h  |  2 +-
 arch/powerpc/platforms/pseries/smp.c |  1 +
 4 files changed, 17 insertions(+), 7 deletions(-)
 create mode 100644 arch/powerpc/include/asm/kvm_guest.h

diff --git a/arch/powerpc/include/asm/firmware.h 
b/arch/powerpc/include/asm/firmware.h
index 0b295bdb201e..aa6a5ef5d483 100644
--- a/arch/powerpc/include/asm/firmware.h
+++ b/arch/powerpc/include/asm/firmware.h
@@ -134,12 +134,6 @@ extern int ibm_nmi_interlock_token;
 
 extern unsigned int __start___fw_ftr_fixup, __stop___fw_ftr_fixup;
 
-#if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_KVM_GUEST)
-bool is_kvm_guest(void);
-#else
-static inline bool is_kvm_guest(void) { return false; }
-#endif
-
 #ifdef CONFIG_PPC_PSERIES
 void pseries_probe_fw_features(void);
 #else
diff --git a/arch/powerpc/include/asm/kvm_guest.h 
b/arch/powerpc/include/asm/kvm_guest.h
new file mode 100644
index ..c0ace884a0e8
--- /dev/null
+++ b/arch/powerpc/include/asm/kvm_guest.h
@@ -0,0 +1,15 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2020  IBM Corporation
+ */
+
+#ifndef __POWERPC_KVM_GUEST_H__
+#define __POWERPC_KVM_GUEST_H__
+
+#if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_KVM_GUEST)
+bool is_kvm_guest(void);
+#else
+static inline bool is_kvm_guest(void) { return false; }
+#endif
+
+#endif /* __POWERPC_KVM_GUEST_H__ */
diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index 744612054c94..abe1b5e82547 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -8,7 +8,7 @@
 #ifndef __POWERPC_KVM_PARA_H__
 #define __POWERPC_KVM_PARA_H__
 
-#include 
+#include 
 
 #include 
 
diff --git a/arch/powerpc/platforms/pseries/smp.c 
b/arch/powerpc/platforms/pseries/smp.c
index 92922491a81c..d578732c545d 100644
--- a/arch/powerpc/platforms/pseries/smp.c
+++ b/arch/powerpc/platforms/pseries/smp.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "pseries.h"
 
-- 
2.18.4



[PATCH 0/4] Powerpc: Better preemption for shared processor

2020-10-28 Thread Srikar Dronamraju
Currently, vcpu_is_preempted will return the yield_count for
shared_processor. On a PowerVM LPAR, Phyp schedules at SMT8 core boundary
i.e all CPUs belonging to a core are either group scheduled in or group
scheduled out. This can be used to better predict non-preempted CPUs on
PowerVM shared LPARs.

perf stat -r 5 -a perf bench sched pipe -l 1000 (lesser time is better)

powerpc/next
 35,107,951.20 msec cpu-clock #  255.898 CPUs utilized  
  ( +-  0.31% )
23,655,348  context-switches  #0.674 K/sec  
  ( +-  3.72% )
14,465  cpu-migrations#0.000 K/sec  
  ( +-  5.37% )
82,463  page-faults   #0.002 K/sec  
  ( +-  8.40% )
 1,127,182,328,206  cycles#0.032 GHz
  ( +-  1.60% )  (66.67%)
78,587,300,622  stalled-cycles-frontend   #6.97% frontend cycles 
idle ( +-  0.08% )  (50.01%)
   654,124,218,432  stalled-cycles-backend#   58.03% backend cycles 
idle  ( +-  1.74% )  (50.01%)
   834,013,059,242  instructions  #0.74  insn per cycle
  #0.78  stalled cycles per 
insn  ( +-  0.73% )  (66.67%)
   132,911,454,387  branches  #3.786 M/sec  
  ( +-  0.59% )  (50.00%)
 2,890,882,143  branch-misses #2.18% of all branches
  ( +-  0.46% )  (50.00%)

   137.195 +- 0.419 seconds time elapsed  ( +-  0.31% )

powerpc/next + patchset
 29,981,702.64 msec cpu-clock #  255.881 CPUs utilized  
  ( +-  1.30% )
40,162,456  context-switches  #0.001 M/sec  
  ( +-  0.01% )
 1,110  cpu-migrations#0.000 K/sec  
  ( +-  5.20% )
62,616  page-faults   #0.002 K/sec  
  ( +-  3.93% )
 1,430,030,626,037  cycles#0.048 GHz
  ( +-  1.41% )  (66.67%)
83,202,707,288  stalled-cycles-frontend   #5.82% frontend cycles 
idle ( +-  0.75% )  (50.01%)
   744,556,088,520  stalled-cycles-backend#   52.07% backend cycles 
idle  ( +-  1.39% )  (50.01%)
   940,138,418,674  instructions  #0.66  insn per cycle
  #0.79  stalled cycles per 
insn  ( +-  0.51% )  (66.67%)
   146,452,852,283  branches  #4.885 M/sec  
  ( +-  0.80% )  (50.00%)
 3,237,743,996  branch-misses #2.21% of all branches
  ( +-  1.18% )  (50.01%)

117.17 +- 1.52 seconds time elapsed  ( +-  1.30% )

This is around 14.6% improvement in performance.

Cc: linuxppc-dev 
Cc: LKML 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Nathan Lynch 
Cc: Gautham R Shenoy 
Cc: Peter Zijlstra 
Cc: Valentin Schneider 
Cc: Juri Lelli 
Cc: Waiman Long 
Cc: Phil Auld 

Srikar Dronamraju (4):
  powerpc: Refactor is_kvm_guest declaration to new header
  powerpc: Rename is_kvm_guest to check_kvm_guest
  powerpc: Reintroduce is_kvm_guest
  powerpc/paravirt: Use is_kvm_guest in vcpu_is_preempted

 arch/powerpc/include/asm/firmware.h  |  6 --
 arch/powerpc/include/asm/kvm_guest.h | 25 +
 arch/powerpc/include/asm/kvm_para.h  |  2 +-
 arch/powerpc/include/asm/paravirt.h  | 18 ++
 arch/powerpc/kernel/firmware.c   |  5 -
 arch/powerpc/platforms/pseries/smp.c |  3 ++-
 6 files changed, 50 insertions(+), 9 deletions(-)
 create mode 100644 arch/powerpc/include/asm/kvm_guest.h

-- 
2.18.4



[PATCH 2/4] powerpc: Rename is_kvm_guest to check_kvm_guest

2020-10-28 Thread Srikar Dronamraju
is_kvm_guest() will be reused in subsequent patch in a new avatar.  Hence
rename is_kvm_guest to check_kvm_guest. No additional changes.

Signed-off-by: Srikar Dronamraju 
Cc: linuxppc-dev 
Cc: LKML 
Cc: Michael Ellerman 
Cc: Nicholas Piggin 
Cc: Nathan Lynch 
Cc: Gautham R Shenoy 
Cc: Peter Zijlstra 
Cc: Valentin Schneider 
Cc: Juri Lelli 
Cc: Waiman Long 
Cc: Phil Auld 
---
 arch/powerpc/include/asm/kvm_guest.h | 4 ++--
 arch/powerpc/include/asm/kvm_para.h  | 2 +-
 arch/powerpc/kernel/firmware.c   | 2 +-
 arch/powerpc/platforms/pseries/smp.c | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_guest.h 
b/arch/powerpc/include/asm/kvm_guest.h
index c0ace884a0e8..ba8291e02ba9 100644
--- a/arch/powerpc/include/asm/kvm_guest.h
+++ b/arch/powerpc/include/asm/kvm_guest.h
@@ -7,9 +7,9 @@
 #define __POWERPC_KVM_GUEST_H__
 
 #if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_KVM_GUEST)
-bool is_kvm_guest(void);
+bool check_kvm_guest(void);
 #else
-static inline bool is_kvm_guest(void) { return false; }
+static inline bool check_kvm_guest(void) { return false; }
 #endif
 
 #endif /* __POWERPC_KVM_GUEST_H__ */
diff --git a/arch/powerpc/include/asm/kvm_para.h 
b/arch/powerpc/include/asm/kvm_para.h
index abe1b5e82547..6fba06b6cfdb 100644
--- a/arch/powerpc/include/asm/kvm_para.h
+++ b/arch/powerpc/include/asm/kvm_para.h
@@ -14,7 +14,7 @@
 
 static inline int kvm_para_available(void)
 {
-   return IS_ENABLED(CONFIG_KVM_GUEST) && is_kvm_guest();
+   return IS_ENABLED(CONFIG_KVM_GUEST) && check_kvm_guest();
 }
 
 static inline unsigned int kvm_arch_para_features(void)
diff --git a/arch/powerpc/kernel/firmware.c b/arch/powerpc/kernel/firmware.c
index fe48d319d490..61243267d4cf 100644
--- a/arch/powerpc/kernel/firmware.c
+++ b/arch/powerpc/kernel/firmware.c
@@ -21,7 +21,7 @@ EXPORT_SYMBOL_GPL(powerpc_firmware_features);
 #endif
 
 #if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_KVM_GUEST)
-bool is_kvm_guest(void)
+bool check_kvm_guest(void)
 {
struct device_node *hyper_node;
 
diff --git a/arch/powerpc/platforms/pseries/smp.c 
b/arch/powerpc/platforms/pseries/smp.c
index d578732c545d..c70b4be9f0a5 100644
--- a/arch/powerpc/platforms/pseries/smp.c
+++ b/arch/powerpc/platforms/pseries/smp.c
@@ -211,7 +211,7 @@ static __init void pSeries_smp_probe(void)
if (!cpu_has_feature(CPU_FTR_SMT))
return;
 
-   if (is_kvm_guest()) {
+   if (check_kvm_guest()) {
/*
 * KVM emulates doorbells by disabling FSCR[MSGP] so msgsndp
 * faults to the hypervisor which then reads the instruction
-- 
2.18.4



Re: [PATCH 0/4] arch, mm: improve robustness of direct map manipulation

2020-10-28 Thread Mike Rapoport
On Wed, Oct 28, 2020 at 12:17:35PM +0100, David Hildenbrand wrote:
> On 28.10.20 12:09, Mike Rapoport wrote:
> > On Tue, Oct 27, 2020 at 09:46:35AM +0100, David Hildenbrand wrote:
> > > On 27.10.20 09:38, Mike Rapoport wrote:
> > > > On Mon, Oct 26, 2020 at 06:05:30PM +, Edgecombe, Rick P wrote:
> > > > 
> > > > > Beyond whatever you are seeing, for the latter case of new things
> > > > > getting introduced to an interface with hidden dependencies... Another
> > > > > edge case could be a new caller to set_memory_np() could result in
> > > > > large NP pages. None of the callers today should cause this AFAICT, 
> > > > > but
> > > > > it's not great to rely on the callers to know these details.
> > 
> > > > A caller of set_memory_*() or set_direct_map_*() should expect a failure
> > > > and be ready for that. So adding a WARN to safe_copy_page() is the first
> > > > step in that direction :)
> > > > 
> > > 
> > > I am probably missing something important, but why are we saving/restoring
> > > the content of pages that were explicitly removed from the identity 
> > > mapping
> > > such that nobody will access them?
> > 
> > Actually, we should not be saving/restoring free pages during
> > hibernation as there are several calls to mark_free_pages() that should
> > exclude the free pages from the snapshot. I've tried to find why the fix
> > that maps/unmaps a page to save it was required at the first place, but
> > I could not find bug reports.
> > 
> > The closest I've got is an email from Rafael that asked to update
> > "hibernate: handle DEBUG_PAGEALLOC" patch:
> > 
> > https://lore.kernel.org/linux-pm/200802200133.44098@sisk.pl/
> > 
> > Could it be that safe_copy_page() tries to workaround a non-existent
> > problem?
> > 
> 
> Clould be! Also see
> 
> https://lkml.kernel.org/r/38de5bb0-5559-d069-0ce0-daec66ef2...@suse.cz
> 
> which restores free page content based on more kernel parameters, not based
> on the original content.

Ah, after looking at it now I've run kernel with DEBUG_PAGEALLOC=y and
CONFIG_INIT_ON_FREE_DEFAULT_ON=y and restore crahsed nicely.

[   27.210093] PM: Image successfully loaded
[   27.226709] Disabling non-boot CPUs ...  
[   27.231208] smpboot: CPU 1 is now offline
[   27.363926] kvm-clock: cpu 0, msr 5c889001, primary cpu clock, resume
[   27.363995] BUG: unable to handle page fault for address: 9f7a40108000   
[   27.367996] #PF: supervisor write access in kernel mode  
[   27.369558] #PF: error_code(0x0002) - not-present page   
[   27.371098] PGD 5ca01067 P4D 5ca01067 PUD 5ca02067 PMD 5ca03067 PTE 800ff
fef7060 
[   27.373421] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC PTI  
[   27.374905] CPU: 0 PID: 1200 Comm: bash Not tainted 5.10.0-rc1 #5
[   27.376700] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14
.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 
[   27.379879] RIP: 0010:clear_page_rep+0x7/0x10  
[   27.381218] Code: e8 be 88 75 00 44 89 e2 48 89 ee 48 89 df e8 60 ff ff ff c6
 03 00 5b 5d 41 5c c3 cc cc cc cc cc cc cc cc b9 00 02 00 00 31 c0  48 ab c3
 0f 1f 44 00 00 31 c0 b9 40 00 00 00 66 0f 1f 84 00 00  
[   27.386457] RSP: 0018:b6838046be08 EFLAGS: 00010046  
[   27.388011] RAX:  RBX: 9f7a487c0ec0 RCX: 0200
[   27.390082] RDX: 9f7a4c788000 RSI:  RDI: 9f7a40108000
[   27.392138] RBP: 8629c860 R08:  R09: 0007
[   27.394205] R10: 0004 R11: b6838046bbf8 R12: 
[   27.396271] R13: 9f7a419a62a0 R14: 0005 R15: 9f7a484f4da0
[   27.398334] FS:  7fe0c3f6a700() GS:9f7abf80() 
knlGS:  
   
[   27.400717] CS:  0010 DS:  ES:  CR0: 80050033
[   27.402432] CR2: 9f7a40108000 CR3: 0859a001 CR4: 00060ef0
[   27.404485] Call Trace:  
[   27.405326]  clear_free_pages+0xf5/0x150 
[   27.406568]  hibernation_snapshot+0x390/0x3d0
[   27.407908]  hibernate+0xdb/0x240
[   27.408978]  state_store+0xd7/0xe0   
[   27.410078]  kernfs_fop_write+0x10e/0x1a0
[   27.411333]  vfs_write+0xbb/0x210
[   27.412423]  ksys_write+0x9c/0xd0  
[   27.413488]  do_syscall_64+0x33/0x40 
[   27.414636]  entry_SYSCALL_64_after_hwframe+0x44/0xa9 

[PATCH 5/9] kprobes/ftrace: Add recursion protection to the ftrace callback

2020-10-28 Thread Steven Rostedt
From: "Steven Rostedt (VMware)" 

If a ftrace callback does not supply its own recursion protection and
does not set the RECURSION_SAFE flag in its ftrace_ops, then ftrace will
make a helper trampoline to do so before calling the callback instead of
just calling the callback directly.

The default for ftrace_ops is going to assume recursion protection unless
otherwise specified.

Cc: Masami Hiramatsu 
Cc: Guo Ren 
Cc: "James E.J. Bottomley" 
Cc: Helge Deller 
Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Heiko Carstens 
Cc: Vasily Gorbik 
Cc: Christian Borntraeger 
Cc: Thomas Gleixner 
Cc: Borislav Petkov 
Cc: x...@kernel.org
Cc: "H. Peter Anvin" 
Cc: "Naveen N. Rao" 
Cc: Anil S Keshavamurthy 
Cc: "David S. Miller" 
Cc: linux-c...@vger.kernel.org
Cc: linux-par...@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s...@vger.kernel.org
Signed-off-by: Steven Rostedt (VMware) 
---
 arch/csky/kernel/probes/ftrace.c | 12 ++--
 arch/parisc/kernel/ftrace.c  | 13 +++--
 arch/powerpc/kernel/kprobes-ftrace.c | 11 ++-
 arch/s390/kernel/ftrace.c| 13 +++--
 arch/x86/kernel/kprobes/ftrace.c | 12 ++--
 5 files changed, 52 insertions(+), 9 deletions(-)

diff --git a/arch/csky/kernel/probes/ftrace.c b/arch/csky/kernel/probes/ftrace.c
index 5264763d05be..5eb2604fdf71 100644
--- a/arch/csky/kernel/probes/ftrace.c
+++ b/arch/csky/kernel/probes/ftrace.c
@@ -13,16 +13,21 @@ int arch_check_ftrace_location(struct kprobe *p)
 void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip,
   struct ftrace_ops *ops, struct pt_regs *regs)
 {
+   int bit;
bool lr_saver = false;
struct kprobe *p;
struct kprobe_ctlblk *kcb;
 
-   /* Preempt is disabled by ftrace */
+   bit = ftrace_test_recursion_trylock();
+   if (bit < 0)
+   return;
+
+   preempt_disable_notrace();
p = get_kprobe((kprobe_opcode_t *)ip);
if (!p) {
p = get_kprobe((kprobe_opcode_t *)(ip - MCOUNT_INSN_SIZE));
if (unlikely(!p) || kprobe_disabled(p))
-   return;
+   goto out;
lr_saver = true;
}
 
@@ -56,6 +61,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long 
parent_ip,
 */
__this_cpu_write(current_kprobe, NULL);
}
+out:
+   preempt_enable_notrace();
+   ftrace_test_recursion_unlock(bit);
 }
 NOKPROBE_SYMBOL(kprobe_ftrace_handler);
 
diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c
index 4bab21c71055..5f7742b225a5 100644
--- a/arch/parisc/kernel/ftrace.c
+++ b/arch/parisc/kernel/ftrace.c
@@ -208,13 +208,19 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned 
long parent_ip,
 {
struct kprobe_ctlblk *kcb;
struct kprobe *p = get_kprobe((kprobe_opcode_t *)ip);
+   int bit;
 
-   if (unlikely(!p) || kprobe_disabled(p))
+   bit = ftrace_test_recursion_trylock();
+   if (bit < 0)
return;
 
+   preempt_disable_notrace();
+   if (unlikely(!p) || kprobe_disabled(p))
+   goto out;
+
if (kprobe_running()) {
kprobes_inc_nmissed_count(p);
-   return;
+   goto out;
}
 
__this_cpu_write(current_kprobe, p);
@@ -235,6 +241,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long 
parent_ip,
}
}
__this_cpu_write(current_kprobe, NULL);
+out:
+   preempt_enable_notrace();
+   ftrace_test_recursion_unlock(bit);
 }
 NOKPROBE_SYMBOL(kprobe_ftrace_handler);
 
diff --git a/arch/powerpc/kernel/kprobes-ftrace.c 
b/arch/powerpc/kernel/kprobes-ftrace.c
index 972cb28174b2..5df8d50c65ae 100644
--- a/arch/powerpc/kernel/kprobes-ftrace.c
+++ b/arch/powerpc/kernel/kprobes-ftrace.c
@@ -18,10 +18,16 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long 
parent_nip,
 {
struct kprobe *p;
struct kprobe_ctlblk *kcb;
+   int bit;
 
+   bit = ftrace_test_recursion_trylock();
+   if (bit < 0)
+   return;
+
+   preempt_disable_notrace();
p = get_kprobe((kprobe_opcode_t *)nip);
if (unlikely(!p) || kprobe_disabled(p))
-   return;
+   goto out;
 
kcb = get_kprobe_ctlblk();
if (kprobe_running()) {
@@ -52,6 +58,9 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long 
parent_nip,
 */
__this_cpu_write(current_kprobe, NULL);
}
+out:
+   preempt_enable_notrace();
+   ftrace_test_recursion_unlock(bit);
 }
 NOKPROBE_SYMBOL(kprobe_ftrace_handler);
 
diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c
index b388e87a08bf..88466d7fb6b2 100644
--- a/arch/s390/kernel/ftrace.c
+++ b/arch/s390/kernel/ftrace.c
@@ -202,13 +202,19 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned 
long 

Test Results: RE: powerpc: avoid broken GCC __attribute__((optimize))

2020-10-28 Thread snowpatch
Thanks for your contribution, unfortunately we've found some issues.

Your patch was successfully applied on branch powerpc/merge 
(8cb17737940b156329cb5210669b9c9b23f4dd56)

The test build-ppc64le reported the following: Build failed!

 Full log: 
https://openpower.xyz/job/snowpatch/job/snowpatch-linux-sparse/21048//artifact/linux/report.txt


Here's a preview of the log:

arch/powerpc/kernel/paca.c:244:25: error: expected '=', ',', ';', 'asm' or 
'__attribute__' before 'setup_paca'
  244 | void __nostackprotector setup_paca(struct paca_struct *new_paca)
  | ^~
make[2]: *** [scripts/Makefile.build:283: arch/powerpc/kernel/paca.o] Error 1
make[1]: *** [scripts/Makefile.build:500: arch/powerpc/kernel] Error 2
make[1]: *** Waiting for unfinished jobs
make: *** [Makefile:1799: arch/powerpc] Error 2
make: *** Waiting for unfinished jobs


The test build-ppc64be reported the following: Build failed!

 Full log: 
https://openpower.xyz/job/snowpatch/job/snowpatch-linux-sparse/21049//artifact/linux/report.txt


Here's a preview of the log:

arch/powerpc/kernel/paca.c:244:25: error: expected '=', ',', ';', 'asm' or 
'__attribute__' before 'setup_paca'
  244 | void __nostackprotector setup_paca(struct paca_struct *new_paca)
  | ^~
make[2]: *** [scripts/Makefile.build:283: arch/powerpc/kernel/paca.o] Error 1
make[1]: *** [scripts/Makefile.build:500: arch/powerpc/kernel] Error 2
make[1]: *** Waiting for unfinished jobs
make: *** [Makefile:1799: arch/powerpc] Error 2
make: *** Waiting for unfinished jobs


The test build-ppc64e reported the following: Build failed!

 Full log: 
https://openpower.xyz/job/snowpatch/job/snowpatch-linux-sparse/21050//artifact/linux/report.txt


Here's a preview of the log:

arch/powerpc/kernel/paca.c:244:25: error: expected '=', ',', ';', 'asm' or 
'__attribute__' before 'setup_paca'
  244 | void __nostackprotector setup_paca(struct paca_struct *new_paca)
  | ^~
make[2]: *** [scripts/Makefile.build:283: arch/powerpc/kernel/paca.o] Error 1
make[1]: *** [scripts/Makefile.build:500: arch/powerpc/kernel] Error 2
make[1]: *** Waiting for unfinished jobs
make: *** [Makefile:1799: arch/powerpc] Error 2
make: *** Waiting for unfinished jobs





Re: [PATCH 0/4] arch, mm: improve robustness of direct map manipulation

2020-10-28 Thread Mike Rapoport
On Wed, Oct 28, 2020 at 11:20:12AM +, Will Deacon wrote:
> On Tue, Oct 27, 2020 at 10:38:16AM +0200, Mike Rapoport wrote:
> > On Mon, Oct 26, 2020 at 06:05:30PM +, Edgecombe, Rick P wrote:
> > > On Mon, 2020-10-26 at 11:05 +0200, Mike Rapoport wrote:
> > > > On Mon, Oct 26, 2020 at 01:13:52AM +, Edgecombe, Rick P wrote:
> > > > > On Sun, 2020-10-25 at 12:15 +0200, Mike Rapoport wrote:
> > > > > > Indeed, for architectures that define
> > > > > > CONFIG_ARCH_HAS_SET_DIRECT_MAP
> > > > > > it is
> > > > > > possible that __kernel_map_pages() would fail, but since this
> > > > > > function is
> > > > > > void, the failure will go unnoticed.
> > > > > 
> > > > > Could you elaborate on how this could happen? Do you mean during
> > > > > runtime today or if something new was introduced?
> > > > 
> > > > A failure in__kernel_map_pages() may happen today. For instance, on
> > > > x86
> > > > if the kernel is built with DEBUG_PAGEALLOC.
> > > > 
> > > > __kernel_map_pages(page, 1, 0);
> > > > 
> > > > will need to split, say, 2M page and during the split an allocation
> > > > of
> > > > page table could fail.
> > > 
> > > On x86 at least, DEBUG_PAGEALLOC expects to never have to break a page
> > > on the direct map and even disables locking in cpa because it assumes
> > > this. If this is happening somehow anyway then we should probably fix
> > > that. Even if it's a debug feature, it will not be as useful if it is
> > > causing its own crashes.
> > > 
> > > I'm still wondering if there is something I'm missing here. It seems
> > > like you are saying there is a bug in some arch's, so let's add a WARN
> > > in cross-arch code to log it as it crashes. A warn and making things
> > > clearer seem like good ideas, but if there is a bug we should fix it.
> > > The code around the callers still functionally assume re-mapping can't
> > > fail.
> > 
> > Oh, I've meant x86 kernel *without* DEBUG_PAGEALLOC, and indeed the call
> > that unmaps pages back in safe_copy_page will just reset a 4K page to
> > NP because whatever made it NP at the first place already did the split.
> > 
> > Still, on arm64 with DEBUG_PAGEALLOC=n there is a possibility of a race
> > between map/unmap dance in __vunmap() and safe_copy_page() that may
> > cause access to unmapped memory:
> > 
> > __vunmap()
> > vm_remove_mappings()
> > set_direct_map_invalid()
> > safe_copy_page()
> > __kernel_map_pages()
> > return
> > do_copy_page() -> fault
> > 
> > This is a theoretical bug, but it is still not nice :)  
> > 
> 
> Just to clarify: this patch series fixes this problem, right?

Yes.

> Will

-- 
Sincerely yours,
Mike.


Re: [PATCH 0/4] arch, mm: improve robustness of direct map manipulation

2020-10-28 Thread Will Deacon
On Tue, Oct 27, 2020 at 10:38:16AM +0200, Mike Rapoport wrote:
> On Mon, Oct 26, 2020 at 06:05:30PM +, Edgecombe, Rick P wrote:
> > On Mon, 2020-10-26 at 11:05 +0200, Mike Rapoport wrote:
> > > On Mon, Oct 26, 2020 at 01:13:52AM +, Edgecombe, Rick P wrote:
> > > > On Sun, 2020-10-25 at 12:15 +0200, Mike Rapoport wrote:
> > > > > Indeed, for architectures that define
> > > > > CONFIG_ARCH_HAS_SET_DIRECT_MAP
> > > > > it is
> > > > > possible that __kernel_map_pages() would fail, but since this
> > > > > function is
> > > > > void, the failure will go unnoticed.
> > > > 
> > > > Could you elaborate on how this could happen? Do you mean during
> > > > runtime today or if something new was introduced?
> > > 
> > > A failure in__kernel_map_pages() may happen today. For instance, on
> > > x86
> > > if the kernel is built with DEBUG_PAGEALLOC.
> > > 
> > > __kernel_map_pages(page, 1, 0);
> > > 
> > > will need to split, say, 2M page and during the split an allocation
> > > of
> > > page table could fail.
> > 
> > On x86 at least, DEBUG_PAGEALLOC expects to never have to break a page
> > on the direct map and even disables locking in cpa because it assumes
> > this. If this is happening somehow anyway then we should probably fix
> > that. Even if it's a debug feature, it will not be as useful if it is
> > causing its own crashes.
> > 
> > I'm still wondering if there is something I'm missing here. It seems
> > like you are saying there is a bug in some arch's, so let's add a WARN
> > in cross-arch code to log it as it crashes. A warn and making things
> > clearer seem like good ideas, but if there is a bug we should fix it.
> > The code around the callers still functionally assume re-mapping can't
> > fail.
> 
> Oh, I've meant x86 kernel *without* DEBUG_PAGEALLOC, and indeed the call
> that unmaps pages back in safe_copy_page will just reset a 4K page to
> NP because whatever made it NP at the first place already did the split.
> 
> Still, on arm64 with DEBUG_PAGEALLOC=n there is a possibility of a race
> between map/unmap dance in __vunmap() and safe_copy_page() that may
> cause access to unmapped memory:
> 
> __vunmap()
> vm_remove_mappings()
> set_direct_map_invalid()
>   safe_copy_page()
>   __kernel_map_pages()
>   return
>   do_copy_page() -> fault
>   
> This is a theoretical bug, but it is still not nice :)
> 

Just to clarify: this patch series fixes this problem, right?

Will


Re: [PATCH 0/4] arch, mm: improve robustness of direct map manipulation

2020-10-28 Thread David Hildenbrand

On 28.10.20 12:09, Mike Rapoport wrote:

On Tue, Oct 27, 2020 at 09:46:35AM +0100, David Hildenbrand wrote:

On 27.10.20 09:38, Mike Rapoport wrote:

On Mon, Oct 26, 2020 at 06:05:30PM +, Edgecombe, Rick P wrote:


Beyond whatever you are seeing, for the latter case of new things
getting introduced to an interface with hidden dependencies... Another
edge case could be a new caller to set_memory_np() could result in
large NP pages. None of the callers today should cause this AFAICT, but
it's not great to rely on the callers to know these details.



A caller of set_memory_*() or set_direct_map_*() should expect a failure
and be ready for that. So adding a WARN to safe_copy_page() is the first
step in that direction :)



I am probably missing something important, but why are we saving/restoring
the content of pages that were explicitly removed from the identity mapping
such that nobody will access them?


Actually, we should not be saving/restoring free pages during
hibernation as there are several calls to mark_free_pages() that should
exclude the free pages from the snapshot. I've tried to find why the fix
that maps/unmaps a page to save it was required at the first place, but
I could not find bug reports.

The closest I've got is an email from Rafael that asked to update
"hibernate: handle DEBUG_PAGEALLOC" patch:

https://lore.kernel.org/linux-pm/200802200133.44098@sisk.pl/

Could it be that safe_copy_page() tries to workaround a non-existent
problem?



Clould be! Also see

https://lkml.kernel.org/r/38de5bb0-5559-d069-0ce0-daec66ef2...@suse.cz

which restores free page content based on more kernel parameters, not 
based on the original content.


--
Thanks,

David / dhildenb



Re: [PATCH 0/4] arch, mm: improve robustness of direct map manipulation

2020-10-28 Thread Mike Rapoport
On Tue, Oct 27, 2020 at 09:46:35AM +0100, David Hildenbrand wrote:
> On 27.10.20 09:38, Mike Rapoport wrote:
> > On Mon, Oct 26, 2020 at 06:05:30PM +, Edgecombe, Rick P wrote:
> > 
> > > Beyond whatever you are seeing, for the latter case of new things
> > > getting introduced to an interface with hidden dependencies... Another
> > > edge case could be a new caller to set_memory_np() could result in
> > > large NP pages. None of the callers today should cause this AFAICT, but
> > > it's not great to rely on the callers to know these details.

> > A caller of set_memory_*() or set_direct_map_*() should expect a failure
> > and be ready for that. So adding a WARN to safe_copy_page() is the first
> > step in that direction :)
> > 
> 
> I am probably missing something important, but why are we saving/restoring
> the content of pages that were explicitly removed from the identity mapping
> such that nobody will access them?

Actually, we should not be saving/restoring free pages during
hibernation as there are several calls to mark_free_pages() that should
exclude the free pages from the snapshot. I've tried to find why the fix
that maps/unmaps a page to save it was required at the first place, but
I could not find bug reports.

The closest I've got is an email from Rafael that asked to update
"hibernate: handle DEBUG_PAGEALLOC" patch:

https://lore.kernel.org/linux-pm/200802200133.44098@sisk.pl/

Could it be that safe_copy_page() tries to workaround a non-existent
problem?

-- 
Sincerely yours,
Mike.


Re: [PATCH 2/4] PM: hibernate: improve robustness of mapping pages in the direct map

2020-10-28 Thread Mike Rapoport
On Tue, Oct 27, 2020 at 10:44:21PM +, Edgecombe, Rick P wrote:
> On Tue, 2020-10-27 at 10:49 +0200, Mike Rapoport wrote:
> > On Mon, Oct 26, 2020 at 06:57:32PM +, Edgecombe, Rick P wrote:
> > > On Mon, 2020-10-26 at 11:15 +0200, Mike Rapoport wrote:
> > > > On Mon, Oct 26, 2020 at 12:38:32AM +, Edgecombe, Rick P
> > > > wrote:
> > > > > On Sun, 2020-10-25 at 12:15 +0200, Mike Rapoport wrote:
> > > > > > From: Mike Rapoport 
> > > > > > 
> > > > > > When DEBUG_PAGEALLOC or ARCH_HAS_SET_DIRECT_MAP is enabled a
> > > > > > page
> > > > > > may
> > > > > > be
> > > > > > not present in the direct map and has to be explicitly mapped
> > > > > > before
> > > > > > it
> > > > > > could be copied.
> > > > > > 
> > > > > > On arm64 it is possible that a page would be removed from the
> > > > > > direct
> > > > > > map
> > > > > > using set_direct_map_invalid_noflush() but
> > > > > > __kernel_map_pages()
> > > > > > will
> > > > > > refuse
> > > > > > to map this page back if DEBUG_PAGEALLOC is disabled.
> > > > > 
> > > > > It looks to me that arm64 __kernel_map_pages() will still
> > > > > attempt
> > > > > to
> > > > > map it if rodata_full is true, how does this happen?
> > > > 
> > > > Unless I misread the code, arm64 requires both rodata_full and
> > > > debug_pagealloc_enabled() to be true for __kernel_map_pages() to
> > > > do
> > > > anything.
> > > > But rodata_full condition applies to set_direct_map_*_noflush()
> > > > as
> > > > well,
> > > > so with !rodata_full the linear map won't be ever changed.
> > > 
> > > Hmm, looks to me that __kernel_map_pages() will only skip it if
> > > both
> > > debug pagealloc and rodata_full are false.
> > > 
> > > But now I'm wondering if maybe we could simplify things by just
> > > moving
> > > the hibernate unmapped page logic off of the direct map. On x86,
> > > text_poke() used to use this reserved fixmap pte thing that it
> > > could
> > > rely on to remap memory with. If hibernate had some separate pte
> > > for
> > > remapping like that, then we could not have any direct map
> > > restrictions
> > > caused by it/kernel_map_pages(), and it wouldn't have to worry
> > > about
> > > relying on anything else.
> > 
> > Well, there is map_kernel_range() that can be used by hibernation as
> > there is no requirement for particular virtual address, but that
> > would
> > be quite costly if done for every page.
> > 
> > Maybe we can do somthing like
> > 
> > if (kernel_page_present(s_page)) {
> > do_copy_page(dst, page_address(s_page));
> > } else {
> > map_kernel_range_noflush(page_address(page), PAGE_SIZE,
> >  PROT_READ, );
> > do_copy_page(dst, page_address(s_page));
> > unmap_kernel_range_noflush(page_address(page),
> > PAGE_SIZE);
> > }
> > 
> > But it seems that a prerequisite for changing the way a page is
> > mapped
> > in safe_copy_page() would be to teach hibernation that a mapping here
> > may fail.
> > 
> Yea that is what I meant, the direct map could still be used for mapped
> pages.
> 
> But for the unmapped case it could have a pre-setup 4k pte for some non
> direct map address. Then just change the pte to point to any unmapped
> direct map page that was encountered. The point would be to give
> hibernate some 4k pte of its own to manipulate so that it can't fail.
> 
> Yet another option would be have hibernate_map_page() just map large
> pages if it finds them.
> 
> So we could teach hibernate to handle mapping failures, OR we could
> change it so it doesn't rely on direct map page sizes in order to
> succeed. The latter seems better to me since there isn't a reason why
> it should have to fail and the resulting logic might be simpler. Both
> seem like improvements in robustness though.

That's correct, but as the purpose of this series is to prevent usage of
__kernel_map_pages() outside DEBUG_PAGALLOC, for now I'm going to update this
patch changelog and add a comment to hibernate_map_page().

-- 
Sincerely yours,
Mike.


[PATCH] powerpc: sysdev: add missing iounmap() on error in mpic_msgr_probe()

2020-10-28 Thread Qinglang Miao
I noticed that iounmap() of msgr_block_addr before return from
mpic_msgr_probe() in the error handling case is missing. So use
devm_ioremap() instead of just ioremap() when remapping the message
register block, so the mapping will be automatically released on
probe failure.

Signed-off-by: Qinglang Miao 
---
 arch/powerpc/sysdev/mpic_msgr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/sysdev/mpic_msgr.c b/arch/powerpc/sysdev/mpic_msgr.c
index f6b253e2b..36ec0bdd8 100644
--- a/arch/powerpc/sysdev/mpic_msgr.c
+++ b/arch/powerpc/sysdev/mpic_msgr.c
@@ -191,7 +191,7 @@ static int mpic_msgr_probe(struct platform_device *dev)
 
/* IO map the message register block. */
of_address_to_resource(np, 0, );
-   msgr_block_addr = ioremap(rsrc.start, resource_size());
+   msgr_block_addr = devm_ioremap(>dev, rsrc.start, 
resource_size());
if (!msgr_block_addr) {
dev_err(>dev, "Failed to iomap MPIC message registers");
return -EFAULT;
-- 
2.23.0



Re: [PATCH] powerpc: avoid broken GCC __attribute__((optimize))

2020-10-28 Thread Ard Biesheuvel
On Wed, 28 Oct 2020 at 09:04, Ard Biesheuvel  wrote:
>
> Commit 7053f80d9696 ("powerpc/64: Prevent stack protection in early boot")
> introduced a couple of uses of __attribute__((optimize)) with function
> scope, to disable the stack protector in some early boot code.
>
> Unfortunately, and this is documented in the GCC man pages [0], overriding
> function attributes for optimization is broken, and is only supported for
> debug scenarios, not for production: the problem appears to be that
> setting GCC -f flags using this method will cause it to forget about some
> or all other optimization settings that have been applied.
>
> So the only safe way to disable the stack protector is to disable it for
> the entire source file.
>
> [0] https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html
>
> Cc: Michael Ellerman 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Nick Desaulniers 
> Cc: Arvind Sankar 
> Cc: Randy Dunlap 
> Cc: Josh Poimboeuf 
> Cc: Thomas Gleixner 
> Cc: Alexei Starovoitov 
> Cc: Daniel Borkmann 
> Cc: Peter Zijlstra (Intel) 
> Cc: Geert Uytterhoeven 
> Cc: Kees Cook 
> Fixes: 7053f80d9696 ("powerpc/64: Prevent stack protection in early boot")
> Signed-off-by: Ard Biesheuvel 
> ---
> Related discussion here:
> https://lore.kernel.org/lkml/CAMuHMdUg0WJHEcq6to0-eODpXPOywLot6UD2=gfhpzoj_hc...@mail.gmail.com/
>
> TL;DR using __attribute__((optimize("-fno-gcse"))) in the BPF interpreter
> causes the compiler to forget about -fno-asynchronous-unwind-tables passed
> on the command line, resulting in unexpected .eh_frame sections in vmlinux.
>
>  arch/powerpc/kernel/Makefile   | 3 +++
>  arch/powerpc/kernel/paca.c | 2 +-
>  arch/powerpc/kernel/setup.h| 6 --
>  arch/powerpc/kernel/setup_64.c | 2 +-
>  4 files changed, 5 insertions(+), 8 deletions(-)
>

FYI i was notified by one of the robots that I missed one occurrence
of __nostackprotector in arch/powerpc/kernel/paca.c

Let me know if I need to resend.


> diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
> index bf0bf1b900d2..fe2ef598e2ea 100644
> --- a/arch/powerpc/kernel/Makefile
> +++ b/arch/powerpc/kernel/Makefile
> @@ -173,6 +173,9 @@ KCOV_INSTRUMENT_cputable.o := n
>  KCOV_INSTRUMENT_setup_64.o := n
>  KCOV_INSTRUMENT_paca.o := n
>
> +CFLAGS_setup_64.o  += -fno-stack-protector
> +CFLAGS_paca.o  += -fno-stack-protector
> +
>  extra-$(CONFIG_PPC_FPU)+= fpu.o
>  extra-$(CONFIG_ALTIVEC)+= vector.o
>  extra-$(CONFIG_PPC64)  += entry_64.o
> diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
> index 0ad15768d762..fe70834d7283 100644
> --- a/arch/powerpc/kernel/paca.c
> +++ b/arch/powerpc/kernel/paca.c
> @@ -208,7 +208,7 @@ static struct rtas_args * __init new_rtas_args(int cpu, 
> unsigned long limit)
>  struct paca_struct **paca_ptrs __read_mostly;
>  EXPORT_SYMBOL(paca_ptrs);
>
> -void __init __nostackprotector initialise_paca(struct paca_struct *new_paca, 
> int cpu)
> +void __init initialise_paca(struct paca_struct *new_paca, int cpu)
>  {
>  #ifdef CONFIG_PPC_PSERIES
> new_paca->lppaca_ptr = NULL;
> diff --git a/arch/powerpc/kernel/setup.h b/arch/powerpc/kernel/setup.h
> index 2ec835574cc9..2dd0d9cb5a20 100644
> --- a/arch/powerpc/kernel/setup.h
> +++ b/arch/powerpc/kernel/setup.h
> @@ -8,12 +8,6 @@
>  #ifndef __ARCH_POWERPC_KERNEL_SETUP_H
>  #define __ARCH_POWERPC_KERNEL_SETUP_H
>
> -#ifdef CONFIG_CC_IS_CLANG
> -#define __nostackprotector
> -#else
> -#define __nostackprotector 
> __attribute__((__optimize__("no-stack-protector")))
> -#endif
> -
>  void initialize_cache_info(void);
>  void irqstack_early_init(void);
>
> diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
> index bb9cab3641d7..da447a62ea1e 100644
> --- a/arch/powerpc/kernel/setup_64.c
> +++ b/arch/powerpc/kernel/setup_64.c
> @@ -283,7 +283,7 @@ void __init record_spr_defaults(void)
>   * device-tree is not accessible via normal means at this point.
>   */
>
> -void __init __nostackprotector early_setup(unsigned long dt_ptr)
> +void __init early_setup(unsigned long dt_ptr)
>  {
> static __initdata struct paca_struct boot_paca;
>
> --
> 2.17.1
>


[PATCH] powerpc: avoid broken GCC __attribute__((optimize))

2020-10-28 Thread Ard Biesheuvel
Commit 7053f80d9696 ("powerpc/64: Prevent stack protection in early boot")
introduced a couple of uses of __attribute__((optimize)) with function
scope, to disable the stack protector in some early boot code.

Unfortunately, and this is documented in the GCC man pages [0], overriding
function attributes for optimization is broken, and is only supported for
debug scenarios, not for production: the problem appears to be that
setting GCC -f flags using this method will cause it to forget about some
or all other optimization settings that have been applied.

So the only safe way to disable the stack protector is to disable it for
the entire source file.

[0] https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html

Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Nick Desaulniers 
Cc: Arvind Sankar 
Cc: Randy Dunlap 
Cc: Josh Poimboeuf 
Cc: Thomas Gleixner 
Cc: Alexei Starovoitov 
Cc: Daniel Borkmann 
Cc: Peter Zijlstra (Intel) 
Cc: Geert Uytterhoeven 
Cc: Kees Cook 
Fixes: 7053f80d9696 ("powerpc/64: Prevent stack protection in early boot")
Signed-off-by: Ard Biesheuvel 
---
Related discussion here:
https://lore.kernel.org/lkml/CAMuHMdUg0WJHEcq6to0-eODpXPOywLot6UD2=gfhpzoj_hc...@mail.gmail.com/

TL;DR using __attribute__((optimize("-fno-gcse"))) in the BPF interpreter
causes the compiler to forget about -fno-asynchronous-unwind-tables passed
on the command line, resulting in unexpected .eh_frame sections in vmlinux.

 arch/powerpc/kernel/Makefile   | 3 +++
 arch/powerpc/kernel/paca.c | 2 +-
 arch/powerpc/kernel/setup.h| 6 --
 arch/powerpc/kernel/setup_64.c | 2 +-
 4 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index bf0bf1b900d2..fe2ef598e2ea 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -173,6 +173,9 @@ KCOV_INSTRUMENT_cputable.o := n
 KCOV_INSTRUMENT_setup_64.o := n
 KCOV_INSTRUMENT_paca.o := n
 
+CFLAGS_setup_64.o  += -fno-stack-protector
+CFLAGS_paca.o  += -fno-stack-protector
+
 extra-$(CONFIG_PPC_FPU)+= fpu.o
 extra-$(CONFIG_ALTIVEC)+= vector.o
 extra-$(CONFIG_PPC64)  += entry_64.o
diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c
index 0ad15768d762..fe70834d7283 100644
--- a/arch/powerpc/kernel/paca.c
+++ b/arch/powerpc/kernel/paca.c
@@ -208,7 +208,7 @@ static struct rtas_args * __init new_rtas_args(int cpu, 
unsigned long limit)
 struct paca_struct **paca_ptrs __read_mostly;
 EXPORT_SYMBOL(paca_ptrs);
 
-void __init __nostackprotector initialise_paca(struct paca_struct *new_paca, 
int cpu)
+void __init initialise_paca(struct paca_struct *new_paca, int cpu)
 {
 #ifdef CONFIG_PPC_PSERIES
new_paca->lppaca_ptr = NULL;
diff --git a/arch/powerpc/kernel/setup.h b/arch/powerpc/kernel/setup.h
index 2ec835574cc9..2dd0d9cb5a20 100644
--- a/arch/powerpc/kernel/setup.h
+++ b/arch/powerpc/kernel/setup.h
@@ -8,12 +8,6 @@
 #ifndef __ARCH_POWERPC_KERNEL_SETUP_H
 #define __ARCH_POWERPC_KERNEL_SETUP_H
 
-#ifdef CONFIG_CC_IS_CLANG
-#define __nostackprotector
-#else
-#define __nostackprotector __attribute__((__optimize__("no-stack-protector")))
-#endif
-
 void initialize_cache_info(void);
 void irqstack_early_init(void);
 
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index bb9cab3641d7..da447a62ea1e 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -283,7 +283,7 @@ void __init record_spr_defaults(void)
  * device-tree is not accessible via normal means at this point.
  */
 
-void __init __nostackprotector early_setup(unsigned long dt_ptr)
+void __init early_setup(unsigned long dt_ptr)
 {
static __initdata struct paca_struct boot_paca;
 
-- 
2.17.1



Re: [PATCH 1/3] powerpc/uaccess: Switch __put_user_size_allowed() to __put_user_asm_goto()

2020-10-28 Thread Andreas Schwab
On Okt 28 2020, Michael Ellerman wrote:

> What config and compiler are you using?

gcc 4.9.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


[PATCH kernel v3 2/2] powerpc/dma: Fallback to dma_ops when persistent memory present

2020-10-28 Thread Alexey Kardashevskiy
So far we have been using huge DMA windows to map all the RAM available.
The RAM is normally mapped to the VM address space contiguously, and
there is always a reasonable upper limit for possible future hot plugged
RAM which makes it easy to map all RAM via IOMMU.

Now there is persistent memory ("ibm,pmemory" in the FDT) which (unlike
normal RAM) can map anywhere in the VM space beyond the maximum RAM size
and since it can be used for DMA, it requires extending the huge window
up to MAX_PHYSMEM_BITS which requires hypervisor support for:
1. huge TCE tables;
2. multilevel TCE tables;
3. huge IOMMU pages.

Certain hypervisors cannot do either so the only option left is
restricting the huge DMA window to include only RAM and fallback to
the default DMA window for persistent memory.

This defines arch_dma_map_direct/etc to allow generic DMA code perform
additional checks on whether direct DMA is still possible.

This checks if the system has persistent memory. If it does not,
the DMA bypass mode is selected, i.e.
* dev->bus_dma_limit = 0
* dev->dma_ops_bypass = true <- this avoid calling dma_ops for mapping.

If there is such memory, this creates identity mapping only for RAM and
sets the dev->bus_dma_limit to let the generic code decide whether to
call into the direct DMA or the indirect DMA ops.

This should not change the existing behaviour when no persistent memory
as dev->dma_ops_bypass is expected to be set.

Signed-off-by: Alexey Kardashevskiy 
---
 arch/powerpc/kernel/dma-iommu.c| 70 +-
 arch/powerpc/platforms/pseries/iommu.c | 44 
 arch/powerpc/Kconfig   |  1 +
 3 files changed, 103 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
index a1c744194018..21e2d9f059a9 100644
--- a/arch/powerpc/kernel/dma-iommu.c
+++ b/arch/powerpc/kernel/dma-iommu.c
@@ -10,6 +10,63 @@
 #include 
 #include 
 
+#define can_map_direct(dev, addr) \
+   ((dev)->bus_dma_limit >= phys_to_dma((dev), (addr)))
+
+bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr)
+{
+   if (likely(!dev->bus_dma_limit))
+   return false;
+
+   return can_map_direct(dev, addr);
+}
+EXPORT_SYMBOL_GPL(arch_dma_map_page_direct);
+
+#define is_direct_handle(dev, h) ((h) >= (dev)->archdata.dma_offset)
+
+bool arch_dma_unmap_page_direct(struct device *dev, dma_addr_t dma_handle)
+{
+   if (likely(!dev->bus_dma_limit))
+   return false;
+
+   return is_direct_handle(dev, dma_handle);
+}
+EXPORT_SYMBOL_GPL(arch_dma_unmap_page_direct);
+
+bool arch_dma_map_sg_direct(struct device *dev, struct scatterlist *sg, int 
nents)
+{
+   struct scatterlist *s;
+   int i;
+
+   if (likely(!dev->bus_dma_limit))
+   return false;
+
+   for_each_sg(sg, s, nents, i) {
+   if (!can_map_direct(dev, sg_phys(s) + s->offset + s->length))
+   return false;
+   }
+
+   return true;
+}
+EXPORT_SYMBOL(arch_dma_map_sg_direct);
+
+bool arch_dma_unmap_sg_direct(struct device *dev, struct scatterlist *sg, int 
nents)
+{
+   struct scatterlist *s;
+   int i;
+
+   if (likely(!dev->bus_dma_limit))
+   return false;
+
+   for_each_sg(sg, s, nents, i) {
+   if (!is_direct_handle(dev, s->dma_address + s->length))
+   return false;
+   }
+
+   return true;
+}
+EXPORT_SYMBOL(arch_dma_unmap_sg_direct);
+
 /*
  * Generic iommu implementation
  */
@@ -90,8 +147,17 @@ int dma_iommu_dma_supported(struct device *dev, u64 mask)
struct iommu_table *tbl = get_iommu_table_base(dev);
 
if (dev_is_pci(dev) && dma_iommu_bypass_supported(dev, mask)) {
-   dev->dma_ops_bypass = true;
-   dev_dbg(dev, "iommu: 64-bit OK, using fixed ops\n");
+   /*
+* dma_iommu_bypass_supported() sets dma_max when there is
+* 1:1 mapping but it is somehow limited.
+* ibm,pmemory is one example.
+*/
+   dev->dma_ops_bypass = dev->bus_dma_limit == 0;
+   if (!dev->dma_ops_bypass)
+   dev_warn(dev, "iommu: 64-bit OK but direct DMA is 
limited by %llx\n",
+dev->bus_dma_limit);
+   else
+   dev_dbg(dev, "iommu: 64-bit OK, using fixed ops\n");
return 1;
}
 
diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index e4198700ed1a..91112e748491 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -839,7 +839,7 @@ static void remove_ddw(struct device_node *np, bool 
remove_prop)
np, ret);
 }
 
-static u64 find_existing_ddw(struct device_node *pdn)
+static u64 find_existing_ddw(struct device_node *pdn, int *window_shift)
 {
struct direct_window *window;
  

[PATCH kernel v3 0/2] DMA, powerpc/dma: Fallback to dma_ops when persistent memory present

2020-10-28 Thread Alexey Kardashevskiy
This allows mixing direct DMA (to/from RAM) and
IOMMU (to/from apersistent memory) on the PPC64/pseries
platform.

This replaces https://lkml.org/lkml/2020/10/27/418
which replaces https://lkml.org/lkml/2020/10/20/1085


This is based on sha1
4525c8781ec0 Linus Torvalds "scsi: qla2xxx: remove incorrect sparse #ifdef".

Please comment. Thanks.



Alexey Kardashevskiy (2):
  dma: Allow mixing bypass and mapped DMA operation
  powerpc/dma: Fallback to dma_ops when persistent memory present

 arch/powerpc/kernel/dma-iommu.c| 70 +-
 arch/powerpc/platforms/pseries/iommu.c | 44 
 kernel/dma/mapping.c   | 24 +++--
 arch/powerpc/Kconfig   |  1 +
 kernel/dma/Kconfig |  4 ++
 5 files changed, 127 insertions(+), 16 deletions(-)

-- 
2.17.1



[PATCH kernel v3 1/2] dma: Allow mixing bypass and mapped DMA operation

2020-10-28 Thread Alexey Kardashevskiy
At the moment we allow bypassing DMA ops only when we can do this for
the entire RAM. However there are configs with mixed type memory
where we could still allow bypassing IOMMU in most cases;
POWERPC with persistent memory is one example.

This adds an arch hook to determine where bypass can still work and
we invoke direct DMA API. The following patch checks the bus limit
on POWERPC to allow or disallow direct mapping.

This adds a CONFIG_ARCH_HAS_DMA_SET_MASK config option to make arch_
hooks no-op by default.

Signed-off-by: Alexey Kardashevskiy 
---
 kernel/dma/mapping.c | 24 
 kernel/dma/Kconfig   |  4 
 2 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 51bb8fa8eb89..a0bc9eb876ed 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -137,6 +137,18 @@ static inline bool dma_map_direct(struct device *dev,
return dma_go_direct(dev, *dev->dma_mask, ops);
 }
 
+#ifdef CONFIG_ARCH_HAS_DMA_MAP_DIRECT
+bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr);
+bool arch_dma_unmap_page_direct(struct device *dev, dma_addr_t dma_handle);
+bool arch_dma_map_sg_direct(struct device *dev, struct scatterlist *sg, int 
nents);
+bool arch_dma_unmap_sg_direct(struct device *dev, struct scatterlist *sg, int 
nents);
+#else
+#define arch_dma_map_page_direct(d, a) (0)
+#define arch_dma_unmap_page_direct(d, a) (0)
+#define arch_dma_map_sg_direct(d, s, n) (0)
+#define arch_dma_unmap_sg_direct(d, s, n) (0)
+#endif
+
 dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page,
size_t offset, size_t size, enum dma_data_direction dir,
unsigned long attrs)
@@ -149,7 +161,8 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct 
page *page,
if (WARN_ON_ONCE(!dev->dma_mask))
return DMA_MAPPING_ERROR;
 
-   if (dma_map_direct(dev, ops))
+   if (dma_map_direct(dev, ops) ||
+   arch_dma_map_page_direct(dev, page_to_phys(page) + offset + size))
addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
else
addr = ops->map_page(dev, page, offset, size, dir, attrs);
@@ -165,7 +178,8 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t 
addr, size_t size,
const struct dma_map_ops *ops = get_dma_ops(dev);
 
BUG_ON(!valid_dma_direction(dir));
-   if (dma_map_direct(dev, ops))
+   if (dma_map_direct(dev, ops) ||
+   arch_dma_unmap_page_direct(dev, addr + size))
dma_direct_unmap_page(dev, addr, size, dir, attrs);
else if (ops->unmap_page)
ops->unmap_page(dev, addr, size, dir, attrs);
@@ -188,7 +202,8 @@ int dma_map_sg_attrs(struct device *dev, struct scatterlist 
*sg, int nents,
if (WARN_ON_ONCE(!dev->dma_mask))
return 0;
 
-   if (dma_map_direct(dev, ops))
+   if (dma_map_direct(dev, ops) ||
+   arch_dma_map_sg_direct(dev, sg, nents))
ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
else
ents = ops->map_sg(dev, sg, nents, dir, attrs);
@@ -207,7 +222,8 @@ void dma_unmap_sg_attrs(struct device *dev, struct 
scatterlist *sg,
 
BUG_ON(!valid_dma_direction(dir));
debug_dma_unmap_sg(dev, sg, nents, dir);
-   if (dma_map_direct(dev, ops))
+   if (dma_map_direct(dev, ops) ||
+   arch_dma_unmap_sg_direct(dev, sg, nents))
dma_direct_unmap_sg(dev, sg, nents, dir, attrs);
else if (ops->unmap_sg)
ops->unmap_sg(dev, sg, nents, dir, attrs);
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index c99de4a21458..43d106598e82 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -20,6 +20,10 @@ config DMA_OPS
 config DMA_OPS_BYPASS
bool
 
+# Lets platform IOMMU driver choose between bypass and IOMMU
+config ARCH_HAS_DMA_MAP_DIRECT
+   bool
+
 config NEED_SG_DMA_LENGTH
bool
 
-- 
2.17.1



Re: [PATCH kernel v2 1/2] dma: Allow mixing bypass and normal IOMMU operation

2020-10-28 Thread Alexey Kardashevskiy




On 28/10/2020 03:48, Christoph Hellwig wrote:

+static inline bool dma_handle_direct(struct device *dev, dma_addr_t dma_handle)
+{
+   return dma_handle >= dev->archdata.dma_offset;
+}


This won't compile except for powerpc, and directly accesing arch members
in common code is a bad idea.  Maybe both your helpers need to be
supplied by arch code to better abstract this out.



rats, overlooked it :( bus_dma_limit is generic but dma_offset is in 
archdata :-/






if (dma_map_direct(dev, ops))
addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
+#ifdef CONFIG_DMA_OPS_BYPASS_BUS_LIMIT
+   else if (dev->bus_dma_limit &&
+can_map_direct(dev, (phys_addr_t) page_to_phys(page) + offset 
+ size))
+   addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);
+#endif


I don't think page_to_phys needs a phys_addr_t on the return value.
I'd also much prefer if we make this a little more beautiful, here
are a few suggestions:

  - hide the bus_dma_limit check inside can_map_direct, and provide a
stub so that we can avoid the ifdef
  - use a better name for can_map_direct, and maybe also a better calling
convention by passing the page (the sg code also has the page), 


It is passing an address of the end of the mapped area so passing a page 
struct means passing page and offset which is an extra parameter and we 
do not want to do anything with the page in those hooks anyway so I'd 
keep it as is.




and
maybe even hide the dma_map_direct inside it.


Call dma_map_direct() from arch_dma_map_page_direct() if 
arch_dma_map_page_direct() is defined? Seems suboptimal as it is going 
to be bypass=true in most cases and we save one call by avoiding calling 
arch_dma_map_page_direct(). Unless I missed something?





if (dma_map_direct(dev, ops) ||
arch_dma_map_page_direct(dev, page, offset, size))
addr = dma_direct_map_page(dev, page, offset, size, dir, attrs);


BUG_ON(!valid_dma_direction(dir));
if (dma_map_direct(dev, ops))
dma_direct_unmap_page(dev, addr, size, dir, attrs);
+#ifdef CONFIG_DMA_OPS_BYPASS_BUS_LIMIT
+   else if (dev->bus_dma_limit && dma_handle_direct(dev, addr + size))
+   dma_direct_unmap_page(dev, addr, size, dir, attrs);
+#endif


Same here.


if (dma_map_direct(dev, ops))
ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
+#ifdef CONFIG_DMA_OPS_BYPASS_BUS_LIMIT
+   else if (dev->bus_dma_limit) {
+   struct scatterlist *s;
+   bool direct = true;
+   int i;
+
+   for_each_sg(sg, s, nents, i) {
+   direct = can_map_direct(dev, sg_phys(s) + s->offset + 
s->length);
+   if (!direct)
+   break;
+   }
+   if (direct)
+   ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
+   else
+   ents = ops->map_sg(dev, sg, nents, dir, attrs);
+   }
+#endif


This needs to go into a helper as well.  I think the same style as
above would work pretty nicely as well:


Yup. I'll repost v3 soon with this change. Thanks for the review.




if (dma_map_direct(dev, ops) ||
arch_dma_map_sg_direct(dev, sg, nents))
ents = dma_direct_map_sg(dev, sg, nents, dir, attrs);
else
ents = ops->map_sg(dev, sg, nents, dir, attrs);


+#ifdef CONFIG_DMA_OPS_BYPASS_BUS_LIMIT
+   if (dev->bus_dma_limit) {
+   struct scatterlist *s;
+   bool direct = true;
+   int i;
+
+   for_each_sg(sg, s, nents, i) {
+   direct = dma_handle_direct(dev, s->dma_address + 
s->length);
+   if (!direct)
+   break;
+   }
+   if (direct) {
+   dma_direct_unmap_sg(dev, sg, nents, dir, attrs);
+   return;
+   }
+   }
+#endif


One more time here..



--
Alexey