Re: [PATCH] powerpc/eeh_cache: Fix a possible debugfs deadlock
On Thu, Oct 29, 2020 at 2:27 AM Qian Cai wrote: > > Lockdep complains that a possible deadlock below in > eeh_addr_cache_show() because it is acquiring a lock with IRQ enabled, > but eeh_addr_cache_insert_dev() needs to acquire the same lock with IRQ > disabled. Let's just make eeh_addr_cache_show() acquire the lock with > IRQ disabled as well. > > CPU0CPU1 > >lock(_io_addr_cache_root.piar_lock); > local_irq_disable(); > lock(>lock); > lock(_io_addr_cache_root.piar_lock); > > lock(>lock); > > *** DEADLOCK *** > > lock_acquire+0x140/0x5f0 > _raw_spin_lock_irqsave+0x64/0xb0 > eeh_addr_cache_insert_dev+0x48/0x390 > eeh_probe_device+0xb8/0x1a0 > pnv_pcibios_bus_add_device+0x3c/0x80 > pcibios_bus_add_device+0x118/0x290 > pci_bus_add_device+0x28/0xe0 > pci_bus_add_devices+0x54/0xb0 > pcibios_init+0xc4/0x124 > do_one_initcall+0xac/0x528 > kernel_init_freeable+0x35c/0x3fc > kernel_init+0x24/0x148 > ret_from_kernel_thread+0x5c/0x80 > > lock_acquire+0x140/0x5f0 > _raw_spin_lock+0x4c/0x70 > eeh_addr_cache_show+0x38/0x110 > seq_read+0x1a0/0x660 > vfs_read+0xc8/0x1f0 > ksys_read+0x74/0x130 > system_call_exception+0xf8/0x1d0 > system_call_common+0xe8/0x218 > > Fixes: 5ca85ae6318d ("powerpc/eeh_cache: Add a way to dump the EEH address > cache") > Signed-off-by: Qian Cai Good catch, Reviewed-by: Oliver O'Halloran
Re: [RFC PATCH kernel 1/2] irq: Add reference counting to IRQ mappings
On 28/10/2020 03:09, Marc Zyngier wrote: Hi Alexey, On 2020-10-27 09:06, Alexey Kardashevskiy wrote: PCI devices share 4 legacy INTx interrupts from the same PCI host bridge. Device drivers map/unmap hardware interrupts via irq_create_mapping()/ irq_dispose_mapping(). The problem with that these interrupts are shared and when performing hot unplug, we need to unmap the interrupt only when the last device is released. This reuses already existing irq_desc::kobj for this purpose. The refcounter is naturally 1 when the descriptor is allocated already; this adds kobject_get() in places where already existing mapped virq is returned. That's quite interesting, as I was about to revive a patch series that rework the irqdomain subsystem to directly cache irq_desc instead of raw interrupt numbers. And for that, I needed some form of refcounting... This reorganizes irq_dispose_mapping() to release the kobj and let the release callback do the cleanup. If some driver or platform does its own reference counting, this expects those parties to call irq_find_mapping() and call irq_dispose_mapping() for every irq_create_fwspec_mapping()/irq_create_mapping(). Signed-off-by: Alexey Kardashevskiy --- kernel/irq/irqdesc.c | 35 +++ kernel/irq/irqdomain.c | 27 +-- 2 files changed, 36 insertions(+), 26 deletions(-) diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c index 1a7723604399..dae096238500 100644 --- a/kernel/irq/irqdesc.c +++ b/kernel/irq/irqdesc.c @@ -419,20 +419,39 @@ static struct irq_desc *alloc_desc(int irq, int node, unsigned int flags, return NULL; } +static void delayed_free_desc(struct rcu_head *rhp); static void irq_kobj_release(struct kobject *kobj) { struct irq_desc *desc = container_of(kobj, struct irq_desc, kobj); + struct irq_domain *domain; + unsigned int virq = desc->irq_data.irq; - free_masks(desc); - free_percpu(desc->kstat_irqs); - kfree(desc); + domain = desc->irq_data.domain; + if (domain) { + if (irq_domain_is_hierarchy(domain)) { + irq_domain_free_irqs(virq, 1); How does this work with hierarchical domains? Each domain should contribute as a reference on the irq_desc. But if you got here, it means the refcount has already dropped to 0. So either there is nothing to free here, or you don't track the references implied by the hierarchy. I suspect the latter. This is correct, I did not look at hierarchy yet, looking now... + } else { + irq_domain_disassociate(domain, virq); + irq_free_desc(virq); + } + } + + /* + * We free the descriptor, masks and stat fields via RCU. That + * allows demultiplex interrupts to do rcu based management of + * the child interrupts. + * This also allows us to use rcu in kstat_irqs_usr(). + */ + call_rcu(>rcu, delayed_free_desc); } static void delayed_free_desc(struct rcu_head *rhp) { struct irq_desc *desc = container_of(rhp, struct irq_desc, rcu); - kobject_put(>kobj); + free_masks(desc); + free_percpu(desc->kstat_irqs); + kfree(desc); } static void free_desc(unsigned int irq) @@ -453,14 +472,6 @@ static void free_desc(unsigned int irq) */ irq_sysfs_del(desc); delete_irq_desc(irq); - - /* - * We free the descriptor, masks and stat fields via RCU. That - * allows demultiplex interrupts to do rcu based management of - * the child interrupts. - * This also allows us to use rcu in kstat_irqs_usr(). - */ - call_rcu(>rcu, delayed_free_desc); } static int alloc_descs(unsigned int start, unsigned int cnt, int node, diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index cf8b374b892d..02733ddc321f 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -638,6 +638,7 @@ unsigned int irq_create_mapping(struct irq_domain *domain, { struct device_node *of_node; int virq; + struct irq_desc *desc; pr_debug("irq_create_mapping(0x%p, 0x%lx)\n", domain, hwirq); @@ -655,7 +656,9 @@ unsigned int irq_create_mapping(struct irq_domain *domain, /* Check if mapping already exists */ virq = irq_find_mapping(domain, hwirq); if (virq) { + desc = irq_to_desc(virq); pr_debug("-> existing mapping on virq %d\n", virq); + kobject_get(>kobj); My worry with this is that there is probably a significant amount of code out there that relies on multiple calls to irq_create_mapping() with the same parameters not to have any side effects. They would expect a subsequent irq_dispose_mapping() to drop the translation altogether, and that's obviously not the case here. Have you audited the various call sites to see what could break? The vast majority calls one of irq..create_mapping in init/probe and then calls irq_dispose_mapping() right there if probing failed or when the driver is unloaded. I could not spot any reference counting anywhere,
[PATCH kernel v4 1/2] dma: Allow mixing bypass and mapped DMA operation
At the moment we allow bypassing DMA ops only when we can do this for the entire RAM. However there are configs with mixed type memory where we could still allow bypassing IOMMU in most cases; POWERPC with persistent memory is one example. This adds an arch hook to determine where bypass can still work and we invoke direct DMA API. The following patch checks the bus limit on POWERPC to allow or disallow direct mapping. This adds a CONFIG_ARCH_HAS_DMA_SET_MASK config option to make arch_ hooks no-op by default. Signed-off-by: Alexey Kardashevskiy --- Changes: v4: * wrapped long lines --- kernel/dma/mapping.c | 26 ++ kernel/dma/Kconfig | 4 2 files changed, 26 insertions(+), 4 deletions(-) diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index 51bb8fa8eb89..ad1f052e046d 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -137,6 +137,20 @@ static inline bool dma_map_direct(struct device *dev, return dma_go_direct(dev, *dev->dma_mask, ops); } +#ifdef CONFIG_ARCH_HAS_DMA_MAP_DIRECT +bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr); +bool arch_dma_unmap_page_direct(struct device *dev, dma_addr_t dma_handle); +bool arch_dma_map_sg_direct(struct device *dev, struct scatterlist *sg, + int nents); +bool arch_dma_unmap_sg_direct(struct device *dev, struct scatterlist *sg, + int nents); +#else +#define arch_dma_map_page_direct(d, a) (0) +#define arch_dma_unmap_page_direct(d, a) (0) +#define arch_dma_map_sg_direct(d, s, n) (0) +#define arch_dma_unmap_sg_direct(d, s, n) (0) +#endif + dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page, size_t offset, size_t size, enum dma_data_direction dir, unsigned long attrs) @@ -149,7 +163,8 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page, if (WARN_ON_ONCE(!dev->dma_mask)) return DMA_MAPPING_ERROR; - if (dma_map_direct(dev, ops)) + if (dma_map_direct(dev, ops) || + arch_dma_map_page_direct(dev, page_to_phys(page) + offset + size)) addr = dma_direct_map_page(dev, page, offset, size, dir, attrs); else addr = ops->map_page(dev, page, offset, size, dir, attrs); @@ -165,7 +180,8 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size, const struct dma_map_ops *ops = get_dma_ops(dev); BUG_ON(!valid_dma_direction(dir)); - if (dma_map_direct(dev, ops)) + if (dma_map_direct(dev, ops) || + arch_dma_unmap_page_direct(dev, addr + size)) dma_direct_unmap_page(dev, addr, size, dir, attrs); else if (ops->unmap_page) ops->unmap_page(dev, addr, size, dir, attrs); @@ -188,7 +204,8 @@ int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents, if (WARN_ON_ONCE(!dev->dma_mask)) return 0; - if (dma_map_direct(dev, ops)) + if (dma_map_direct(dev, ops) || + arch_dma_map_sg_direct(dev, sg, nents)) ents = dma_direct_map_sg(dev, sg, nents, dir, attrs); else ents = ops->map_sg(dev, sg, nents, dir, attrs); @@ -207,7 +224,8 @@ void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg, BUG_ON(!valid_dma_direction(dir)); debug_dma_unmap_sg(dev, sg, nents, dir); - if (dma_map_direct(dev, ops)) + if (dma_map_direct(dev, ops) || + arch_dma_unmap_sg_direct(dev, sg, nents)) dma_direct_unmap_sg(dev, sg, nents, dir, attrs); else if (ops->unmap_sg) ops->unmap_sg(dev, sg, nents, dir, attrs); diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig index c99de4a21458..43d106598e82 100644 --- a/kernel/dma/Kconfig +++ b/kernel/dma/Kconfig @@ -20,6 +20,10 @@ config DMA_OPS config DMA_OPS_BYPASS bool +# Lets platform IOMMU driver choose between bypass and IOMMU +config ARCH_HAS_DMA_MAP_DIRECT + bool + config NEED_SG_DMA_LENGTH bool -- 2.17.1
[PATCH kernel v4 2/2] powerpc/dma: Fallback to dma_ops when persistent memory present
So far we have been using huge DMA windows to map all the RAM available. The RAM is normally mapped to the VM address space contiguously, and there is always a reasonable upper limit for possible future hot plugged RAM which makes it easy to map all RAM via IOMMU. Now there is persistent memory ("ibm,pmemory" in the FDT) which (unlike normal RAM) can map anywhere in the VM space beyond the maximum RAM size and since it can be used for DMA, it requires extending the huge window up to MAX_PHYSMEM_BITS which requires hypervisor support for: 1. huge TCE tables; 2. multilevel TCE tables; 3. huge IOMMU pages. Certain hypervisors cannot do either so the only option left is restricting the huge DMA window to include only RAM and fallback to the default DMA window for persistent memory. This defines arch_dma_map_direct/etc to allow generic DMA code perform additional checks on whether direct DMA is still possible. This checks if the system has persistent memory. If it does not, the DMA bypass mode is selected, i.e. * dev->bus_dma_limit = 0 * dev->dma_ops_bypass = true <- this avoid calling dma_ops for mapping. If there is such memory, this creates identity mapping only for RAM and sets the dev->bus_dma_limit to let the generic code decide whether to call into the direct DMA or the indirect DMA ops. This should not change the existing behaviour when no persistent memory as dev->dma_ops_bypass is expected to be set. Signed-off-by: Alexey Kardashevskiy --- Changes: v4: * fixed leaked device_node * wrapped long lines --- arch/powerpc/kernel/dma-iommu.c| 73 +- arch/powerpc/platforms/pseries/iommu.c | 51 ++ arch/powerpc/Kconfig | 1 + 3 files changed, 113 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c index a1c744194018..e5e9e5e3e3ca 100644 --- a/arch/powerpc/kernel/dma-iommu.c +++ b/arch/powerpc/kernel/dma-iommu.c @@ -10,6 +10,65 @@ #include #include +#define can_map_direct(dev, addr) \ + ((dev)->bus_dma_limit >= phys_to_dma((dev), (addr))) + +bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr) +{ + if (likely(!dev->bus_dma_limit)) + return false; + + return can_map_direct(dev, addr); +} +EXPORT_SYMBOL_GPL(arch_dma_map_page_direct); + +#define is_direct_handle(dev, h) ((h) >= (dev)->archdata.dma_offset) + +bool arch_dma_unmap_page_direct(struct device *dev, dma_addr_t dma_handle) +{ + if (likely(!dev->bus_dma_limit)) + return false; + + return is_direct_handle(dev, dma_handle); +} +EXPORT_SYMBOL_GPL(arch_dma_unmap_page_direct); + +bool arch_dma_map_sg_direct(struct device *dev, struct scatterlist *sg, + int nents) +{ + struct scatterlist *s; + int i; + + if (likely(!dev->bus_dma_limit)) + return false; + + for_each_sg(sg, s, nents, i) { + if (!can_map_direct(dev, sg_phys(s) + s->offset + s->length)) + return false; + } + + return true; +} +EXPORT_SYMBOL(arch_dma_map_sg_direct); + +bool arch_dma_unmap_sg_direct(struct device *dev, struct scatterlist *sg, + int nents) +{ + struct scatterlist *s; + int i; + + if (likely(!dev->bus_dma_limit)) + return false; + + for_each_sg(sg, s, nents, i) { + if (!is_direct_handle(dev, s->dma_address + s->length)) + return false; + } + + return true; +} +EXPORT_SYMBOL(arch_dma_unmap_sg_direct); + /* * Generic iommu implementation */ @@ -90,8 +149,18 @@ int dma_iommu_dma_supported(struct device *dev, u64 mask) struct iommu_table *tbl = get_iommu_table_base(dev); if (dev_is_pci(dev) && dma_iommu_bypass_supported(dev, mask)) { - dev->dma_ops_bypass = true; - dev_dbg(dev, "iommu: 64-bit OK, using fixed ops\n"); + /* +* dma_iommu_bypass_supported() sets dma_max when there is +* 1:1 mapping but it is somehow limited. +* ibm,pmemory is one example. +*/ + dev->dma_ops_bypass = dev->bus_dma_limit == 0; + if (!dev->dma_ops_bypass) + dev_warn(dev, +"iommu: 64-bit OK but direct DMA is limited by %llx\n", +dev->bus_dma_limit); + else + dev_dbg(dev, "iommu: 64-bit OK, using fixed ops\n"); return 1; } diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c index e4198700ed1a..9fc5217f0c8e 100644 --- a/arch/powerpc/platforms/pseries/iommu.c +++ b/arch/powerpc/platforms/pseries/iommu.c @@ -839,7 +839,7 @@ static void remove_ddw(struct device_node *np, bool remove_prop) np, ret); } -static u64
[PATCH kernel v4 0/2] DMA, powerpc/dma: Fallback to dma_ops when persistent memory present
This allows mixing direct DMA (to/from RAM) and IOMMU (to/from apersistent memory) on the PPC64/pseries platform. This replaces https://lkml.org/lkml/2020/10/28/929 which replaces https://lkml.org/lkml/2020/10/27/418 which replaces https://lkml.org/lkml/2020/10/20/1085 This is based on sha1 4525c8781ec0 Linus Torvalds "scsi: qla2xxx: remove incorrect sparse #ifdef". Please comment. Thanks. Alexey Kardashevskiy (2): dma: Allow mixing bypass and mapped DMA operation powerpc/dma: Fallback to dma_ops when persistent memory present arch/powerpc/kernel/dma-iommu.c| 73 +- arch/powerpc/platforms/pseries/iommu.c | 51 ++ kernel/dma/mapping.c | 26 +++-- arch/powerpc/Kconfig | 1 + kernel/dma/Kconfig | 4 ++ 5 files changed, 139 insertions(+), 16 deletions(-) -- 2.17.1
Re: [PATCH] ibmvscsi: fix race potential race after loss of transport
On 10/27/20 10:21 PM, Michael Ellerman wrote: > Tyrel Datwyler writes: >> After a loss of tranport due to an adatper migration or crash/disconnect from >> the host partner there is a tiny window where we can race adjusting the >> request_limit of the adapter. The request limit is atomically inc/dec to >> track >> the number of inflight requests against the allowed limit of our VIOS >> partner. >> After a transport loss we set the request_limit to zero to reflect this >> state. >> However, there is a window where the adapter may attempt to queue a command >> because the transport loss event hasn't been fully processed yet and >> request_limit is still greater than zero. The hypercall to send the event >> will >> fail and the error path will increment the request_limit as a result. If the >> adapter processes the transport event prior to this increment the >> request_limit >> becomes out of sync with the adapter state and can result in scsi commands >> being >> submitted on the now reset connection prior to an SRP Login resulting in a >> protocol violation. >> >> Fix this race by protecting request_limit with the host lock when changing >> the >> value via atomic_set() to indicate no transport. >> >> Signed-off-by: Tyrel Datwyler >> --- >> drivers/scsi/ibmvscsi/ibmvscsi.c | 36 +++- >> 1 file changed, 26 insertions(+), 10 deletions(-) >> >> diff --git a/drivers/scsi/ibmvscsi/ibmvscsi.c >> b/drivers/scsi/ibmvscsi/ibmvscsi.c >> index b1f3017b6547..188ed75417a5 100644 >> --- a/drivers/scsi/ibmvscsi/ibmvscsi.c >> +++ b/drivers/scsi/ibmvscsi/ibmvscsi.c >> @@ -806,6 +806,22 @@ static void purge_requests(struct ibmvscsi_host_data >> *hostdata, int error_code) >> spin_unlock_irqrestore(hostdata->host->host_lock, flags); >> } >> >> +/** >> + * ibmvscsi_set_request_limit - Set the adapter request_limit in response to >> + * an adapter failure, reset, or SRP Login. Done under host lock to prevent >> + * race with scsi command submission. >> + * @hostdata: adapter to adjust >> + * @limit: new request limit >> + */ >> +static void ibmvscsi_set_request_limit(struct ibmvscsi_host_data *hostdata, >> int limit) >> +{ >> +unsigned long flags; >> + >> +spin_lock_irqsave(hostdata->host->host_lock, flags); >> +atomic_set(>request_limit, limit); >> +spin_unlock_irqrestore(hostdata->host->host_lock, flags); >> +} >> + >> /** >> * ibmvscsi_reset_host - Reset the connection to the server >> * @hostdata: struct ibmvscsi_host_data to reset > ... >> @@ -2137,12 +2153,12 @@ static void ibmvscsi_do_work(struct >> ibmvscsi_host_data *hostdata) >> } >> >> hostdata->action = IBMVSCSI_HOST_ACTION_NONE; >> +spin_unlock_irqrestore(hostdata->host->host_lock, flags); > > You drop the lock ... > >> if (rc) { >> -atomic_set(>request_limit, -1); >> +ibmvscsi_set_request_limit(hostdata, -1); > > .. then retake it, then drop it again in ibmvscsi_set_request_limit(). > > Which introduces the possibility that something else gets the lock > before you can set the limit to -1. > > I'm not sure that's a bug, but it's not obviously correct either? Yeah, I'd already moved the request_limit update into its own function before I got to this case which made me a bit uneasy when I realized I had to drop the lock because my new function takes the lock. However, we only need to protect ourselves from from racing with queuecommand() which is locked for its entire call. Further, if we've gotten here it means we were either resetting or re-enabling the adapter which would have already set request_limit to zero. At this point the transport was already gone and we've further failed to reset it. Also, we've blocked any new scsi requests at this point. -Tyrel > > cheers > >> dev_err(hostdata->dev, "error after %s\n", action); >> } >> -spin_unlock_irqrestore(hostdata->host->host_lock, flags); >> >> scsi_unblock_requests(hostdata->host); >> }
Re: [PATCH kernel v3 2/2] powerpc/dma: Fallback to dma_ops when persistent memory present
On 29/10/2020 11:40, Michael Ellerman wrote: Alexey Kardashevskiy writes: diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c index e4198700ed1a..91112e748491 100644 --- a/arch/powerpc/platforms/pseries/iommu.c +++ b/arch/powerpc/platforms/pseries/iommu.c @@ -,11 +1112,13 @@ static void reset_dma_window(struct pci_dev *dev, struct device_node *par_dn) */ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn) { - int len, ret; + int len = 0, ret; + bool pmem_present = of_find_node_by_type(NULL, "ibm,pmemory") != NULL; That leaks a reference on the returned node. dn = of_find_node_by_type(NULL, "ibm,pmemory"); pmem_present = dn != NULL; of_node_put(dn); ah, true. v4 then. @@ -1126,7 +1129,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn) mutex_lock(_window_init_mutex); - dma_addr = find_existing_ddw(pdn); + dma_addr = find_existing_ddw(pdn, ); I don't see len used anywhere? if (dma_addr != 0) goto out_unlock; @@ -1212,14 +1215,26 @@ static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn) } /* verify the window * number of ptes will map the partition */ /* check largest block * page size > max memory hotplug addr */ - max_addr = ddw_memory_hotplug_max(); - if (query.largest_available_block < (max_addr >> page_shift)) { - dev_dbg(>dev, "can't map partition max 0x%llx with %llu " - "%llu-sized pages\n", max_addr, query.largest_available_block, - 1ULL << page_shift); + /* +* The "ibm,pmemory" can appear anywhere in the address space. +* Assuming it is still backed by page structs, try MAX_PHYSMEM_BITS +* for the upper limit and fallback to max RAM otherwise but this +* disables device::dma_ops_bypass. +*/ + len = max_ram_len; Here you override whatever find_existing_ddw() wrote to len? Not always, there is a bunch of gotos before this line to the end of the function and one (which returns the existing window) is legit. Thanks, + if (pmem_present) { + if (query.largest_available_block >= + (1ULL << (MAX_PHYSMEM_BITS - page_shift))) + len = MAX_PHYSMEM_BITS - page_shift; + else + dev_info(>dev, "Skipping ibm,pmemory"); + } + + if (query.largest_available_block < (1ULL << (len - page_shift))) { + dev_dbg(>dev, "can't map partition max 0x%llx with %llu %llu-sized pages\n", + 1ULL << len, query.largest_available_block, 1ULL << page_shift); goto out_failed; } - len = order_base_2(max_addr); win64 = kzalloc(sizeof(struct property), GFP_KERNEL); if (!win64) { dev_info(>dev, cheers -- Alexey
Re: [PATCH kernel v3 2/2] powerpc/dma: Fallback to dma_ops when persistent memory present
Alexey Kardashevskiy writes: > diff --git a/arch/powerpc/platforms/pseries/iommu.c > b/arch/powerpc/platforms/pseries/iommu.c > index e4198700ed1a..91112e748491 100644 > --- a/arch/powerpc/platforms/pseries/iommu.c > +++ b/arch/powerpc/platforms/pseries/iommu.c > @@ -,11 +1112,13 @@ static void reset_dma_window(struct pci_dev *dev, > struct device_node *par_dn) > */ > static u64 enable_ddw(struct pci_dev *dev, struct device_node *pdn) > { > - int len, ret; > + int len = 0, ret; > + bool pmem_present = of_find_node_by_type(NULL, "ibm,pmemory") != NULL; That leaks a reference on the returned node. dn = of_find_node_by_type(NULL, "ibm,pmemory"); pmem_present = dn != NULL; of_node_put(dn); > @@ -1126,7 +1129,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct > device_node *pdn) > > mutex_lock(_window_init_mutex); > > - dma_addr = find_existing_ddw(pdn); > + dma_addr = find_existing_ddw(pdn, ); I don't see len used anywhere? > if (dma_addr != 0) > goto out_unlock; > > @@ -1212,14 +1215,26 @@ static u64 enable_ddw(struct pci_dev *dev, struct > device_node *pdn) > } > /* verify the window * number of ptes will map the partition */ > /* check largest block * page size > max memory hotplug addr */ > - max_addr = ddw_memory_hotplug_max(); > - if (query.largest_available_block < (max_addr >> page_shift)) { > - dev_dbg(>dev, "can't map partition max 0x%llx with %llu " > - "%llu-sized pages\n", max_addr, > query.largest_available_block, > - 1ULL << page_shift); > + /* > + * The "ibm,pmemory" can appear anywhere in the address space. > + * Assuming it is still backed by page structs, try MAX_PHYSMEM_BITS > + * for the upper limit and fallback to max RAM otherwise but this > + * disables device::dma_ops_bypass. > + */ > + len = max_ram_len; Here you override whatever find_existing_ddw() wrote to len? > + if (pmem_present) { > + if (query.largest_available_block >= > + (1ULL << (MAX_PHYSMEM_BITS - page_shift))) > + len = MAX_PHYSMEM_BITS - page_shift; > + else > + dev_info(>dev, "Skipping ibm,pmemory"); > + } > + > + if (query.largest_available_block < (1ULL << (len - page_shift))) { > + dev_dbg(>dev, "can't map partition max 0x%llx with %llu > %llu-sized pages\n", > + 1ULL << len, query.largest_available_block, 1ULL << > page_shift); > goto out_failed; > } > - len = order_base_2(max_addr); > win64 = kzalloc(sizeof(struct property), GFP_KERNEL); > if (!win64) { > dev_info(>dev, cheers
Re: [PATCH] powerpc/smp: Move rcu_cpu_starting() earlier
On Thu, Oct 29, 2020 at 11:09:07AM +1100, Michael Ellerman wrote: > Qian Cai writes: > > The call to rcu_cpu_starting() in start_secondary() is not early enough > > in the CPU-hotplug onlining process, which results in lockdep splats as > > follows: > > Since when? > What kernel version? > > I haven't seen this running CPU hotplug tests with PROVE_LOCKING=y on > v5.10-rc1. Am I missing a CONFIG? My guess would be that adding CONFIG_PROVE_RAW_LOCK_NESTING=y will get you some splats. Thanx, Paul > cheers > > > > WARNING: suspicious RCU usage > > - > > kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!! > > > > other info that might help us debug this: > > > > RCU used illegally from offline CPU! > > rcu_scheduler_active = 1, debug_locks = 1 > > no locks held by swapper/1/0. > > > > Call Trace: > > dump_stack+0xec/0x144 (unreliable) > > lockdep_rcu_suspicious+0x128/0x14c > > __lock_acquire+0x1060/0x1c60 > > lock_acquire+0x140/0x5f0 > > _raw_spin_lock_irqsave+0x64/0xb0 > > clockevents_register_device+0x74/0x270 > > register_decrementer_clockevent+0x94/0x110 > > start_secondary+0x134/0x800 > > start_secondary_prolog+0x10/0x14 > > > > This is avoided by moving the call to rcu_cpu_starting up near the > > beginning of the start_secondary() function. Note that the > > raw_smp_processor_id() is required in order to avoid calling into > > lockdep before RCU has declared the CPU to be watched for readers. > > > > Link: > > https://lore.kernel.org/lkml/160223032121.7002.1269740091547117869.tip-bot2@tip-bot2/ > > Signed-off-by: Qian Cai > > --- > > arch/powerpc/kernel/smp.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c > > index 3c6b9822f978..8c2857cbd960 100644 > > --- a/arch/powerpc/kernel/smp.c > > +++ b/arch/powerpc/kernel/smp.c > > @@ -1393,13 +1393,14 @@ static void add_cpu_to_masks(int cpu) > > /* Activate a secondary processor. */ > > void start_secondary(void *unused) > > { > > - unsigned int cpu = smp_processor_id(); > > + unsigned int cpu = raw_smp_processor_id(); > > > > mmgrab(_mm); > > current->active_mm = _mm; > > > > smp_store_cpu_info(cpu); > > set_dec(tb_ticks_per_jiffy); > > + rcu_cpu_starting(cpu); > > preempt_disable(); > > cpu_callin_map[cpu] = 1; > > > > -- > > 2.28.0
Re: [PATCH 01/13] PCI: dwc/imx6: Drop setting PCI_MSI_FLAGS_ENABLE
Rob Herring writes: > No other host driver sets the PCI_MSI_FLAGS_ENABLE bit, so it must not > be necessary. If it is, a comment is needed. Yeah, but git blame directly points to: 75cb8d20c112 ("PCI: imx: Enable MSI from downstream components") Which has a pretty long explanation. The relevant bit probably being: ... on i.MX6, the MSI Enable bit controls delivery of MSI interrupts from components below the Root Port. So it seems a little rash to just remove the code. cheers > Cc: Richard Zhu > Cc: Lucas Stach > Cc: Lorenzo Pieralisi > Cc: Bjorn Helgaas > Cc: Shawn Guo > Cc: Sascha Hauer > Cc: Pengutronix Kernel Team > Cc: Fabio Estevam > Cc: NXP Linux Team > Signed-off-by: Rob Herring > --- > drivers/pci/controller/dwc/pci-imx6.c | 8 > 1 file changed, 8 deletions(-) > > diff --git a/drivers/pci/controller/dwc/pci-imx6.c > b/drivers/pci/controller/dwc/pci-imx6.c > index 5cf1ef12fb9b..7dd137d62dca 100644 > --- a/drivers/pci/controller/dwc/pci-imx6.c > +++ b/drivers/pci/controller/dwc/pci-imx6.c > @@ -1002,7 +1002,6 @@ static int imx6_pcie_probe(struct platform_device *pdev) > struct resource *dbi_base; > struct device_node *node = dev->of_node; > int ret; > - u16 val; > > imx6_pcie = devm_kzalloc(dev, sizeof(*imx6_pcie), GFP_KERNEL); > if (!imx6_pcie) > @@ -1167,13 +1166,6 @@ static int imx6_pcie_probe(struct platform_device > *pdev) > if (ret < 0) > return ret; > > - if (pci_msi_enabled()) { > - u8 offset = dw_pcie_find_capability(pci, PCI_CAP_ID_MSI); > - val = dw_pcie_readw_dbi(pci, offset + PCI_MSI_FLAGS); > - val |= PCI_MSI_FLAGS_ENABLE; > - dw_pcie_writew_dbi(pci, offset + PCI_MSI_FLAGS, val); > - } > - > return 0; > } > > -- > 2.25.1
Re: [PATCH] powerpc/smp: Move rcu_cpu_starting() earlier
Qian Cai writes: > The call to rcu_cpu_starting() in start_secondary() is not early enough > in the CPU-hotplug onlining process, which results in lockdep splats as > follows: Since when? What kernel version? I haven't seen this running CPU hotplug tests with PROVE_LOCKING=y on v5.10-rc1. Am I missing a CONFIG? cheers > WARNING: suspicious RCU usage > - > kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!! > > other info that might help us debug this: > > RCU used illegally from offline CPU! > rcu_scheduler_active = 1, debug_locks = 1 > no locks held by swapper/1/0. > > Call Trace: > dump_stack+0xec/0x144 (unreliable) > lockdep_rcu_suspicious+0x128/0x14c > __lock_acquire+0x1060/0x1c60 > lock_acquire+0x140/0x5f0 > _raw_spin_lock_irqsave+0x64/0xb0 > clockevents_register_device+0x74/0x270 > register_decrementer_clockevent+0x94/0x110 > start_secondary+0x134/0x800 > start_secondary_prolog+0x10/0x14 > > This is avoided by moving the call to rcu_cpu_starting up near the > beginning of the start_secondary() function. Note that the > raw_smp_processor_id() is required in order to avoid calling into > lockdep before RCU has declared the CPU to be watched for readers. > > Link: > https://lore.kernel.org/lkml/160223032121.7002.1269740091547117869.tip-bot2@tip-bot2/ > Signed-off-by: Qian Cai > --- > arch/powerpc/kernel/smp.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c > index 3c6b9822f978..8c2857cbd960 100644 > --- a/arch/powerpc/kernel/smp.c > +++ b/arch/powerpc/kernel/smp.c > @@ -1393,13 +1393,14 @@ static void add_cpu_to_masks(int cpu) > /* Activate a secondary processor. */ > void start_secondary(void *unused) > { > - unsigned int cpu = smp_processor_id(); > + unsigned int cpu = raw_smp_processor_id(); > > mmgrab(_mm); > current->active_mm = _mm; > > smp_store_cpu_info(cpu); > set_dec(tb_ticks_per_jiffy); > + rcu_cpu_starting(cpu); > preempt_disable(); > cpu_callin_map[cpu] = 1; > > -- > 2.28.0
Re: [PATCH 0/4] Powerpc: Better preemption for shared processor
On 10/28/20 8:35 AM, Srikar Dronamraju wrote: Currently, vcpu_is_preempted will return the yield_count for shared_processor. On a PowerVM LPAR, Phyp schedules at SMT8 core boundary i.e all CPUs belonging to a core are either group scheduled in or group scheduled out. This can be used to better predict non-preempted CPUs on PowerVM shared LPARs. perf stat -r 5 -a perf bench sched pipe -l 1000 (lesser time is better) powerpc/next 35,107,951.20 msec cpu-clock # 255.898 CPUs utilized ( +- 0.31% ) 23,655,348 context-switches #0.674 K/sec ( +- 3.72% ) 14,465 cpu-migrations#0.000 K/sec ( +- 5.37% ) 82,463 page-faults #0.002 K/sec ( +- 8.40% ) 1,127,182,328,206 cycles#0.032 GHz ( +- 1.60% ) (66.67%) 78,587,300,622 stalled-cycles-frontend #6.97% frontend cycles idle ( +- 0.08% ) (50.01%) 654,124,218,432 stalled-cycles-backend# 58.03% backend cycles idle ( +- 1.74% ) (50.01%) 834,013,059,242 instructions #0.74 insn per cycle #0.78 stalled cycles per insn ( +- 0.73% ) (66.67%) 132,911,454,387 branches #3.786 M/sec ( +- 0.59% ) (50.00%) 2,890,882,143 branch-misses #2.18% of all branches ( +- 0.46% ) (50.00%) 137.195 +- 0.419 seconds time elapsed ( +- 0.31% ) powerpc/next + patchset 29,981,702.64 msec cpu-clock # 255.881 CPUs utilized ( +- 1.30% ) 40,162,456 context-switches #0.001 M/sec ( +- 0.01% ) 1,110 cpu-migrations#0.000 K/sec ( +- 5.20% ) 62,616 page-faults #0.002 K/sec ( +- 3.93% ) 1,430,030,626,037 cycles#0.048 GHz ( +- 1.41% ) (66.67%) 83,202,707,288 stalled-cycles-frontend #5.82% frontend cycles idle ( +- 0.75% ) (50.01%) 744,556,088,520 stalled-cycles-backend# 52.07% backend cycles idle ( +- 1.39% ) (50.01%) 940,138,418,674 instructions #0.66 insn per cycle #0.79 stalled cycles per insn ( +- 0.51% ) (66.67%) 146,452,852,283 branches #4.885 M/sec ( +- 0.80% ) (50.00%) 3,237,743,996 branch-misses #2.21% of all branches ( +- 1.18% ) (50.01%) 117.17 +- 1.52 seconds time elapsed ( +- 1.30% ) This is around 14.6% improvement in performance. Cc: linuxppc-dev Cc: LKML Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Nathan Lynch Cc: Gautham R Shenoy Cc: Peter Zijlstra Cc: Valentin Schneider Cc: Juri Lelli Cc: Waiman Long Cc: Phil Auld Srikar Dronamraju (4): powerpc: Refactor is_kvm_guest declaration to new header powerpc: Rename is_kvm_guest to check_kvm_guest powerpc: Reintroduce is_kvm_guest powerpc/paravirt: Use is_kvm_guest in vcpu_is_preempted arch/powerpc/include/asm/firmware.h | 6 -- arch/powerpc/include/asm/kvm_guest.h | 25 + arch/powerpc/include/asm/kvm_para.h | 2 +- arch/powerpc/include/asm/paravirt.h | 18 ++ arch/powerpc/kernel/firmware.c | 5 - arch/powerpc/platforms/pseries/smp.c | 3 ++- 6 files changed, 50 insertions(+), 9 deletions(-) create mode 100644 arch/powerpc/include/asm/kvm_guest.h This patch series looks good to me and the performance is nice too. Acked-by: Waiman Long Just curious, is the performance mainly from the use of static_branch (patches 1 - 3) or from reducing call to yield_count_of(). Cheers, Longman
Re: [PATCH kernel v3 1/2] dma: Allow mixing bypass and mapped DMA operation
On 29/10/2020 04:22, Christoph Hellwig wrote: On Wed, Oct 28, 2020 at 06:00:29PM +1100, Alexey Kardashevskiy wrote: At the moment we allow bypassing DMA ops only when we can do this for the entire RAM. However there are configs with mixed type memory where we could still allow bypassing IOMMU in most cases; POWERPC with persistent memory is one example. This adds an arch hook to determine where bypass can still work and we invoke direct DMA API. The following patch checks the bus limit on POWERPC to allow or disallow direct mapping. This adds a CONFIG_ARCH_HAS_DMA_SET_MASK config option to make arch_ hooks no-op by default. Signed-off-by: Alexey Kardashevskiy --- kernel/dma/mapping.c | 24 kernel/dma/Kconfig | 4 2 files changed, 24 insertions(+), 4 deletions(-) diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index 51bb8fa8eb89..a0bc9eb876ed 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -137,6 +137,18 @@ static inline bool dma_map_direct(struct device *dev, return dma_go_direct(dev, *dev->dma_mask, ops); } +#ifdef CONFIG_ARCH_HAS_DMA_MAP_DIRECT +bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr); +bool arch_dma_unmap_page_direct(struct device *dev, dma_addr_t dma_handle); +bool arch_dma_map_sg_direct(struct device *dev, struct scatterlist *sg, int nents); +bool arch_dma_unmap_sg_direct(struct device *dev, struct scatterlist *sg, int nents); +#else +#define arch_dma_map_page_direct(d, a) (0) +#define arch_dma_unmap_page_direct(d, a) (0) +#define arch_dma_map_sg_direct(d, s, n) (0) +#define arch_dma_unmap_sg_direct(d, s, n) (0) +#endif A bunch of overly long lines here. Except for that this looks ok to me. If you want me to queue up the series I can just fix it up. I thought 100 is the new limit since https://lkml.org/lkml/2020/5/29/1038 (yeah that mentioned some Christoph :) ) and having these multiline does not make a huge difference but feel free fixing them up. Are you going to take both patches? Do you need mpe's ack? Thanks, -- Alexey
Re: [PATCH kernel v2 1/2] dma: Allow mixing bypass and normal IOMMU operation
On 29/10/2020 04:21, Christoph Hellwig wrote: On Wed, Oct 28, 2020 at 05:55:23PM +1100, Alexey Kardashevskiy wrote: It is passing an address of the end of the mapped area so passing a page struct means passing page and offset which is an extra parameter and we do not want to do anything with the page in those hooks anyway so I'd keep it as is. and maybe even hide the dma_map_direct inside it. Call dma_map_direct() from arch_dma_map_page_direct() if arch_dma_map_page_direct() is defined? Seems suboptimal as it is going to be bypass=true in most cases and we save one call by avoiding calling arch_dma_map_page_direct(). Unless I missed something? C does not even evaluate the right hand side of a || expression if the left hand evaluates to true. Right, this is what I meant. dma_map_direct() is inline and fast so I did not want it inside the arch hook which is not inline. -- Alexey
Re: [PATCH 2/4] PM: hibernate: improve robustness of mapping pages in the direct map
On Sun, 2020-10-25 at 12:15 +0200, Mike Rapoport wrote: > + if (IS_ENABLED(CONFIG_ARCH_HAS_SET_DIRECT_MAP)) { > + unsigned long addr = (unsigned > long)page_address(page); > + int ret; > + > + if (enable) > + ret = set_direct_map_default_noflush(page); > + else > + ret = set_direct_map_invalid_noflush(page); > + > + if (WARN_ON(ret)) > + return; > + > + flush_tlb_kernel_range(addr, addr + PAGE_SIZE); > + } else { > + debug_pagealloc_map_pages(page, 1, enable); > + } Looking at the arm side again, I think this might actually introduce a regression for the arm/hibernate/DEBUG_PAGEALLOC combo. Unlike __kernel_map_pages(), it looks like arm's cpa will always bail in the set_direct_map_() functions if rodata_full is false. So if rodata_full was disabled but debug page alloc is on, then this would now skip remapping the pages. I guess the significance depends on whether hibernate could actually try to save any DEBUG_PAGEALLOC unmapped pages. Looks like it to me though.
Re: [PATCH 0/4] arch, mm: improve robustness of direct map manipulation
On Wed, 2020-10-28 at 13:30 +0200, Mike Rapoport wrote: > On Wed, Oct 28, 2020 at 11:20:12AM +, Will Deacon wrote: > > On Tue, Oct 27, 2020 at 10:38:16AM +0200, Mike Rapoport wrote: > > > On Mon, Oct 26, 2020 at 06:05:30PM +, Edgecombe, Rick P > > > wrote: > > > > On Mon, 2020-10-26 at 11:05 +0200, Mike Rapoport wrote: > > > > > On Mon, Oct 26, 2020 at 01:13:52AM +, Edgecombe, Rick P > > > > > wrote: > > > > > > On Sun, 2020-10-25 at 12:15 +0200, Mike Rapoport wrote: > > > > > > > Indeed, for architectures that define > > > > > > > CONFIG_ARCH_HAS_SET_DIRECT_MAP > > > > > > > it is > > > > > > > possible that __kernel_map_pages() would fail, but since > > > > > > > this > > > > > > > function is > > > > > > > void, the failure will go unnoticed. > > > > > > > > > > > > Could you elaborate on how this could happen? Do you mean > > > > > > during > > > > > > runtime today or if something new was introduced? > > > > > > > > > > A failure in__kernel_map_pages() may happen today. For > > > > > instance, on > > > > > x86 > > > > > if the kernel is built with DEBUG_PAGEALLOC. > > > > > > > > > > __kernel_map_pages(page, 1, 0); > > > > > > > > > > will need to split, say, 2M page and during the split an > > > > > allocation > > > > > of > > > > > page table could fail. > > > > > > > > On x86 at least, DEBUG_PAGEALLOC expects to never have to break > > > > a page > > > > on the direct map and even disables locking in cpa because it > > > > assumes > > > > this. If this is happening somehow anyway then we should > > > > probably fix > > > > that. Even if it's a debug feature, it will not be as useful if > > > > it is > > > > causing its own crashes. > > > > > > > > I'm still wondering if there is something I'm missing here. It > > > > seems > > > > like you are saying there is a bug in some arch's, so let's add > > > > a WARN > > > > in cross-arch code to log it as it crashes. A warn and making > > > > things > > > > clearer seem like good ideas, but if there is a bug we should > > > > fix it. > > > > The code around the callers still functionally assume re- > > > > mapping can't > > > > fail. > > > > > > Oh, I've meant x86 kernel *without* DEBUG_PAGEALLOC, and indeed > > > the call > > > that unmaps pages back in safe_copy_page will just reset a 4K > > > page to > > > NP because whatever made it NP at the first place already did the > > > split. > > > > > > Still, on arm64 with DEBUG_PAGEALLOC=n there is a possibility of > > > a race > > > between map/unmap dance in __vunmap() and safe_copy_page() that > > > may > > > cause access to unmapped memory: > > > > > > __vunmap() > > > vm_remove_mappings() > > > set_direct_map_invalid() > > > safe_copy_page() > > > __kernel_map_pages() > > > return > > > do_copy_page() -> fault > > > > > > This is a theoretical bug, but it is still not nice :) > > > > > > > Just to clarify: this patch series fixes this problem, right? > > Yes. > Well, now I'm confused again. As David pointed, __vunmap() should not be executing simultaneously with the hibernate operation because hibernate can't snapshot while data it needs to save is still updating. If a thread was paused when a page was in an "invalid" state, it should be remapped by hibernate before the copy. To level set, before reading this mail, my takeaways from the discussions on potential hibernate/debug page alloc problems were: Potential RISC-V issue: Doesn't have hibernate support Potential ARM issue: The logic around when it's cpa determines pages might be unmapped looks correct for current callers. Potential x86 page break issue: Seems to be ok for now, but a new set_memory_np() caller could violate assumptions in hibernate. Non-obvious thorny logic: General agreement it would be good to separate dependencies. Behavior of V1 of this patchset: No functional change other than addition of a warn in hibernate. So "does this fix the problem", "yes" leaves me a bit confused... Not saying there couldn't be any problems, especially due to the thorniness and cross arch stride, but what is it exactly and how does this series fix it?
Re: [PATCH] powerpc/smp: Move rcu_cpu_starting() earlier
On Wed, Oct 28, 2020 at 02:23:34PM -0400, Qian Cai wrote: > The call to rcu_cpu_starting() in start_secondary() is not early enough > in the CPU-hotplug onlining process, which results in lockdep splats as > follows: > > WARNING: suspicious RCU usage > - > kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!! > > other info that might help us debug this: > > RCU used illegally from offline CPU! > rcu_scheduler_active = 1, debug_locks = 1 > no locks held by swapper/1/0. > > Call Trace: > dump_stack+0xec/0x144 (unreliable) > lockdep_rcu_suspicious+0x128/0x14c > __lock_acquire+0x1060/0x1c60 > lock_acquire+0x140/0x5f0 > _raw_spin_lock_irqsave+0x64/0xb0 > clockevents_register_device+0x74/0x270 > register_decrementer_clockevent+0x94/0x110 > start_secondary+0x134/0x800 > start_secondary_prolog+0x10/0x14 > > This is avoided by moving the call to rcu_cpu_starting up near the > beginning of the start_secondary() function. Note that the > raw_smp_processor_id() is required in order to avoid calling into > lockdep before RCU has declared the CPU to be watched for readers. > > Link: > https://lore.kernel.org/lkml/160223032121.7002.1269740091547117869.tip-bot2@tip-bot2/ > Signed-off-by: Qian Cai Acked-by: Paul E. McKenney > --- > arch/powerpc/kernel/smp.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c > index 3c6b9822f978..8c2857cbd960 100644 > --- a/arch/powerpc/kernel/smp.c > +++ b/arch/powerpc/kernel/smp.c > @@ -1393,13 +1393,14 @@ static void add_cpu_to_masks(int cpu) > /* Activate a secondary processor. */ > void start_secondary(void *unused) > { > - unsigned int cpu = smp_processor_id(); > + unsigned int cpu = raw_smp_processor_id(); > > mmgrab(_mm); > current->active_mm = _mm; > > smp_store_cpu_info(cpu); > set_dec(tb_ticks_per_jiffy); > + rcu_cpu_starting(cpu); > preempt_disable(); > cpu_callin_map[cpu] = 1; > > -- > 2.28.0 >
Re: [PATCH 20/33] docs: ABI: testing: make the files compatible with ReST output
On Wed, Oct 28, 2020 at 03:23:18PM +0100, Mauro Carvalho Chehab wrote: > diff --git a/Documentation/ABI/testing/sysfs-uevent > b/Documentation/ABI/testing/sysfs-uevent > index aa39f8d7bcdf..d0893dad3f38 100644 > --- a/Documentation/ABI/testing/sysfs-uevent > +++ b/Documentation/ABI/testing/sysfs-uevent > @@ -19,7 +19,8 @@ Description: > a transaction identifier so it's possible to use the same > UUID > value for one or more synthetic uevents in which case we > logically group these uevents together for any userspace > -listeners. The UUID value appears in uevent as > +listeners. The UUID value appears in uevent as: I know almost nothing about Sphinx, but why have one colon here ^^^ and ... > + > "SYNTH_UUID=----" environment > variable. > > @@ -30,18 +31,19 @@ Description: > It's possible to define zero or more pairs - each pair is > then > delimited by a space character ' '. Each pair appears in > synthetic uevent as "SYNTH_ARG_KEY=VALUE". That means the KEY > -name gains "SYNTH_ARG_" prefix to avoid possible collisions > +name gains `SYNTH_ARG_` prefix to avoid possible collisions > with existing variables. > > -Example of valid sequence written to the uevent file: > +Example of valid sequence written to the uevent file:: ... two here? Thanks, Richard
[PATCH 13/13] PCI: dwc: Remove unnecessary wrappers around dw_pcie_host_init()
Many calls to dw_pcie_host_init() are in a wrapper function with nothing else now. Let's remove the pointless extra layer. Cc: Richard Zhu Cc: Lucas Stach Cc: Lorenzo Pieralisi Cc: Bjorn Helgaas Cc: Shawn Guo Cc: Sascha Hauer Cc: Pengutronix Kernel Team Cc: Fabio Estevam Cc: NXP Linux Team Cc: Murali Karicheri Cc: Minghuan Lian Cc: Mingkai Hu Cc: Roy Zang Cc: Yue Wang Cc: Kevin Hilman Cc: Neil Armstrong Cc: Jerome Brunet Cc: Martin Blumenstingl Cc: Jonathan Chocron Cc: Jesper Nilsson Cc: Xiaowei Song Cc: Binghui Wang Cc: Kunihiko Hayashi Cc: Masahiro Yamada Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-amlo...@lists.infradead.org Cc: linux-arm-ker...@axis.com Signed-off-by: Rob Herring --- drivers/pci/controller/dwc/pci-imx6.c | 26 ++--- drivers/pci/controller/dwc/pci-keystone.c | 19 +-- drivers/pci/controller/dwc/pci-layerscape.c | 26 ++--- drivers/pci/controller/dwc/pci-meson.c | 22 ++--- drivers/pci/controller/dwc/pcie-al.c| 20 ++-- drivers/pci/controller/dwc/pcie-artpec6.c | 23 +++--- drivers/pci/controller/dwc/pcie-kirin.c | 11 ++--- drivers/pci/controller/dwc/pcie-uniphier.c | 23 +++--- 8 files changed, 17 insertions(+), 153 deletions(-) diff --git a/drivers/pci/controller/dwc/pci-imx6.c b/drivers/pci/controller/dwc/pci-imx6.c index 4ba0b1195ecf..73e5cfc0725a 100644 --- a/drivers/pci/controller/dwc/pci-imx6.c +++ b/drivers/pci/controller/dwc/pci-imx6.c @@ -842,25 +842,6 @@ static const struct dw_pcie_host_ops imx6_pcie_host_ops = { .host_init = imx6_pcie_host_init, }; -static int imx6_add_pcie_port(struct imx6_pcie *imx6_pcie, - struct platform_device *pdev) -{ - struct dw_pcie *pci = imx6_pcie->pci; - struct pcie_port *pp = >pp; - struct device *dev = >dev; - int ret; - - pp->ops = _pcie_host_ops; - - ret = dw_pcie_host_init(pp); - if (ret) { - dev_err(dev, "failed to initialize host\n"); - return ret; - } - - return 0; -} - static const struct dw_pcie_ops dw_pcie_ops = { .start_link = imx6_pcie_start_link, }; @@ -1004,6 +985,7 @@ static int imx6_pcie_probe(struct platform_device *pdev) pci->dev = dev; pci->ops = _pcie_ops; + pci->pp.ops = _pcie_host_ops; imx6_pcie->pci = pci; imx6_pcie->drvdata = of_device_get_match_data(dev); @@ -1153,11 +1135,7 @@ static int imx6_pcie_probe(struct platform_device *pdev) if (ret) return ret; - ret = imx6_add_pcie_port(imx6_pcie, pdev); - if (ret < 0) - return ret; - - return 0; + return dw_pcie_host_init(>pp); } static void imx6_pcie_shutdown(struct platform_device *pdev) diff --git a/drivers/pci/controller/dwc/pci-keystone.c b/drivers/pci/controller/dwc/pci-keystone.c index 5a4bcc2b1ddb..719756160821 100644 --- a/drivers/pci/controller/dwc/pci-keystone.c +++ b/drivers/pci/controller/dwc/pci-keystone.c @@ -844,23 +844,6 @@ static irqreturn_t ks_pcie_err_irq_handler(int irq, void *priv) return ks_pcie_handle_error_irq(ks_pcie); } -static int __init ks_pcie_add_pcie_port(struct keystone_pcie *ks_pcie, - struct platform_device *pdev) -{ - struct dw_pcie *pci = ks_pcie->pci; - struct pcie_port *pp = >pp; - struct device *dev = >dev; - int ret; - - ret = dw_pcie_host_init(pp); - if (ret) { - dev_err(dev, "failed to initialize host\n"); - return ret; - } - - return 0; -} - static void ks_pcie_am654_write_dbi2(struct dw_pcie *pci, void __iomem *base, u32 reg, size_t size, u32 val) { @@ -1255,7 +1238,7 @@ static int __init ks_pcie_probe(struct platform_device *pdev) } pci->pp.ops = host_ops; - ret = ks_pcie_add_pcie_port(ks_pcie, pdev); + ret = dw_pcie_host_init(>pp); if (ret < 0) goto err_get_sync; break; diff --git a/drivers/pci/controller/dwc/pci-layerscape.c b/drivers/pci/controller/dwc/pci-layerscape.c index 400ebbebd00f..44ad34cdc3bc 100644 --- a/drivers/pci/controller/dwc/pci-layerscape.c +++ b/drivers/pci/controller/dwc/pci-layerscape.c @@ -232,31 +232,12 @@ static const struct of_device_id ls_pcie_of_match[] = { { }, }; -static int __init ls_add_pcie_port(struct ls_pcie *pcie) -{ - struct dw_pcie *pci = pcie->pci; - struct pcie_port *pp = >pp; - struct device *dev = pci->dev; - int ret; - - pp->ops = pcie->drvdata->ops; - - ret = dw_pcie_host_init(pp); - if (ret) { - dev_err(dev, "failed to initialize host\n"); - return ret; - } - - return 0; -} - static int __init ls_pcie_probe(struct
[PATCH 12/13] PCI: dwc: Move dw_pcie_setup_rc() to DWC common code
All RC complex drivers must call dw_pcie_setup_rc(). The ordering of the call shouldn't be too important other than being after any RC resets. There's a few calls of dw_pcie_setup_rc() left as drivers implementing suspend/resume need it. Cc: Kishon Vijay Abraham I Cc: Lorenzo Pieralisi Cc: Bjorn Helgaas Cc: Jingoo Han Cc: Kukjin Kim Cc: Krzysztof Kozlowski Cc: Richard Zhu Cc: Lucas Stach Cc: Shawn Guo Cc: Sascha Hauer Cc: Pengutronix Kernel Team Cc: Fabio Estevam Cc: NXP Linux Team Cc: Murali Karicheri Cc: Minghuan Lian Cc: Mingkai Hu Cc: Roy Zang Cc: Yue Wang Cc: Kevin Hilman Cc: Neil Armstrong Cc: Jerome Brunet Cc: Martin Blumenstingl Cc: Thomas Petazzoni Cc: Jesper Nilsson Cc: Gustavo Pimentel Cc: Xiaowei Song Cc: Binghui Wang Cc: Andy Gross Cc: Bjorn Andersson Cc: Stanimir Varbanov Cc: Pratyush Anand Cc: Kunihiko Hayashi Cc: Masahiro Yamada Cc: linux-o...@vger.kernel.org Cc: linux-samsung-...@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-amlo...@lists.infradead.org Cc: linux-arm-ker...@axis.com Cc: linux-arm-...@vger.kernel.org Signed-off-by: Rob Herring --- drivers/pci/controller/dwc/pci-dra7xx.c | 1 - drivers/pci/controller/dwc/pci-exynos.c | 1 - drivers/pci/controller/dwc/pci-imx6.c | 1 - drivers/pci/controller/dwc/pci-keystone.c | 2 -- drivers/pci/controller/dwc/pci-layerscape.c | 2 -- drivers/pci/controller/dwc/pci-meson.c| 2 -- drivers/pci/controller/dwc/pcie-armada8k.c| 2 -- drivers/pci/controller/dwc/pcie-artpec6.c | 1 - drivers/pci/controller/dwc/pcie-designware-host.c | 1 + drivers/pci/controller/dwc/pcie-designware-plat.c | 8 drivers/pci/controller/dwc/pcie-histb.c | 3 --- drivers/pci/controller/dwc/pcie-kirin.c | 2 -- drivers/pci/controller/dwc/pcie-qcom.c| 1 - drivers/pci/controller/dwc/pcie-spear13xx.c | 2 -- drivers/pci/controller/dwc/pcie-uniphier.c| 2 -- 15 files changed, 1 insertion(+), 30 deletions(-) diff --git a/drivers/pci/controller/dwc/pci-dra7xx.c b/drivers/pci/controller/dwc/pci-dra7xx.c index 72a5a2bf933b..b105af63854a 100644 --- a/drivers/pci/controller/dwc/pci-dra7xx.c +++ b/drivers/pci/controller/dwc/pci-dra7xx.c @@ -181,7 +181,6 @@ static int dra7xx_pcie_host_init(struct pcie_port *pp) struct dw_pcie *pci = to_dw_pcie_from_pp(pp); struct dra7xx_pcie *dra7xx = to_dra7xx_pcie(pci); - dw_pcie_setup_rc(pp); dra7xx_pcie_enable_interrupts(dra7xx); return 0; diff --git a/drivers/pci/controller/dwc/pci-exynos.c b/drivers/pci/controller/dwc/pci-exynos.c index 3939fe22e8a2..5c10a5432896 100644 --- a/drivers/pci/controller/dwc/pci-exynos.c +++ b/drivers/pci/controller/dwc/pci-exynos.c @@ -372,7 +372,6 @@ static int exynos_pcie_host_init(struct pcie_port *pp) phy_init(ep->phy); exynos_pcie_deassert_core_reset(ep); - dw_pcie_setup_rc(pp); exynos_pcie_assert_reset(ep); exynos_pcie_enable_interrupts(ep); diff --git a/drivers/pci/controller/dwc/pci-imx6.c b/drivers/pci/controller/dwc/pci-imx6.c index c808b563486f..4ba0b1195ecf 100644 --- a/drivers/pci/controller/dwc/pci-imx6.c +++ b/drivers/pci/controller/dwc/pci-imx6.c @@ -834,7 +834,6 @@ static int imx6_pcie_host_init(struct pcie_port *pp) imx6_pcie_init_phy(imx6_pcie); imx6_pcie_deassert_core_reset(imx6_pcie); imx6_setup_phy_mpll(imx6_pcie); - dw_pcie_setup_rc(pp); return 0; } diff --git a/drivers/pci/controller/dwc/pci-keystone.c b/drivers/pci/controller/dwc/pci-keystone.c index 90b222b020a3..5a4bcc2b1ddb 100644 --- a/drivers/pci/controller/dwc/pci-keystone.c +++ b/drivers/pci/controller/dwc/pci-keystone.c @@ -807,8 +807,6 @@ static int __init ks_pcie_host_init(struct pcie_port *pp) if (ret) return ret; - dw_pcie_setup_rc(pp); - ks_pcie_stop_link(pci); ks_pcie_setup_rc_app_regs(ks_pcie); writew(PCI_IO_RANGE_TYPE_32 | (PCI_IO_RANGE_TYPE_32 << 8), diff --git a/drivers/pci/controller/dwc/pci-layerscape.c b/drivers/pci/controller/dwc/pci-layerscape.c index 0d84986c4c16..400ebbebd00f 100644 --- a/drivers/pci/controller/dwc/pci-layerscape.c +++ b/drivers/pci/controller/dwc/pci-layerscape.c @@ -136,8 +136,6 @@ static int ls_pcie_host_init(struct pcie_port *pp) ls_pcie_drop_msg_tlp(pcie); - dw_pcie_setup_rc(pp); - return 0; } diff --git a/drivers/pci/controller/dwc/pci-meson.c b/drivers/pci/controller/dwc/pci-meson.c index 2df0adcf0bf2..04589f0decb2 100644 --- a/drivers/pci/controller/dwc/pci-meson.c +++ b/drivers/pci/controller/dwc/pci-meson.c @@ -380,8 +380,6 @@ static int meson_pcie_host_init(struct pcie_port *pp) meson_set_max_payload(mp, MAX_PAYLOAD_SIZE); meson_set_max_rd_req_size(mp, MAX_READ_REQ_SIZE); - dw_pcie_setup_rc(pp); - return 0; } diff --git a/drivers/pci/controller/dwc/pcie-armada8k.c
[PATCH 11/13] PCI: dwc: Move dw_pcie_msi_init() into core
The host drivers which call dw_pcie_msi_init() are all the ones using the built-in MSI controller, so let's move it into the common DWC code. Cc: Kishon Vijay Abraham I Cc: Lorenzo Pieralisi Cc: Bjorn Helgaas Cc: Jingoo Han Cc: Kukjin Kim Cc: Krzysztof Kozlowski Cc: Richard Zhu Cc: Lucas Stach Cc: Shawn Guo Cc: Sascha Hauer Cc: Pengutronix Kernel Team Cc: Fabio Estevam Cc: NXP Linux Team Cc: Yue Wang Cc: Kevin Hilman Cc: Neil Armstrong Cc: Jerome Brunet Cc: Martin Blumenstingl Cc: Jesper Nilsson Cc: Gustavo Pimentel Cc: Xiaowei Song Cc: Binghui Wang Cc: Stanimir Varbanov Cc: Andy Gross Cc: Bjorn Andersson Cc: Pratyush Anand Cc: Thierry Reding Cc: Jonathan Hunter Cc: Kunihiko Hayashi Cc: Masahiro Yamada Cc: linux-o...@vger.kernel.org Cc: linux-samsung-...@vger.kernel.org Cc: linux-amlo...@lists.infradead.org Cc: linux-arm-ker...@axis.com Cc: linux-arm-...@vger.kernel.org Cc: linux-te...@vger.kernel.org Signed-off-by: Rob Herring --- drivers/pci/controller/dwc/pci-dra7xx.c | 2 -- drivers/pci/controller/dwc/pci-exynos.c | 4 drivers/pci/controller/dwc/pci-imx6.c | 1 - drivers/pci/controller/dwc/pci-meson.c| 1 - drivers/pci/controller/dwc/pcie-artpec6.c | 1 - drivers/pci/controller/dwc/pcie-designware-host.c | 8 +--- drivers/pci/controller/dwc/pcie-designware-plat.c | 1 - drivers/pci/controller/dwc/pcie-designware.h | 10 -- drivers/pci/controller/dwc/pcie-histb.c | 2 -- drivers/pci/controller/dwc/pcie-kirin.c | 1 - drivers/pci/controller/dwc/pcie-qcom.c| 2 -- drivers/pci/controller/dwc/pcie-spear13xx.c | 6 +- drivers/pci/controller/dwc/pcie-tegra194.c| 2 -- drivers/pci/controller/dwc/pcie-uniphier.c| 1 - 14 files changed, 6 insertions(+), 36 deletions(-) diff --git a/drivers/pci/controller/dwc/pci-dra7xx.c b/drivers/pci/controller/dwc/pci-dra7xx.c index 054423d9646d..72a5a2bf933b 100644 --- a/drivers/pci/controller/dwc/pci-dra7xx.c +++ b/drivers/pci/controller/dwc/pci-dra7xx.c @@ -182,8 +182,6 @@ static int dra7xx_pcie_host_init(struct pcie_port *pp) struct dra7xx_pcie *dra7xx = to_dra7xx_pcie(pci); dw_pcie_setup_rc(pp); - - dw_pcie_msi_init(pp); dra7xx_pcie_enable_interrupts(dra7xx); return 0; diff --git a/drivers/pci/controller/dwc/pci-exynos.c b/drivers/pci/controller/dwc/pci-exynos.c index 6498b615c834..3939fe22e8a2 100644 --- a/drivers/pci/controller/dwc/pci-exynos.c +++ b/drivers/pci/controller/dwc/pci-exynos.c @@ -273,12 +273,8 @@ static irqreturn_t exynos_pcie_irq_handler(int irq, void *arg) static void exynos_pcie_msi_init(struct exynos_pcie *ep) { - struct dw_pcie *pci = ep->pci; - struct pcie_port *pp = >pp; u32 val; - dw_pcie_msi_init(pp); - /* enable MSI interrupt */ val = exynos_pcie_readl(ep->mem_res->elbi_base, PCIE_IRQ_EN_LEVEL); val |= IRQ_MSI_ENABLE; diff --git a/drivers/pci/controller/dwc/pci-imx6.c b/drivers/pci/controller/dwc/pci-imx6.c index 20e249efb02c..c808b563486f 100644 --- a/drivers/pci/controller/dwc/pci-imx6.c +++ b/drivers/pci/controller/dwc/pci-imx6.c @@ -835,7 +835,6 @@ static int imx6_pcie_host_init(struct pcie_port *pp) imx6_pcie_deassert_core_reset(imx6_pcie); imx6_setup_phy_mpll(imx6_pcie); dw_pcie_setup_rc(pp); - dw_pcie_msi_init(pp); return 0; } diff --git a/drivers/pci/controller/dwc/pci-meson.c b/drivers/pci/controller/dwc/pci-meson.c index 41a3351b100b..2df0adcf0bf2 100644 --- a/drivers/pci/controller/dwc/pci-meson.c +++ b/drivers/pci/controller/dwc/pci-meson.c @@ -381,7 +381,6 @@ static int meson_pcie_host_init(struct pcie_port *pp) meson_set_max_rd_req_size(mp, MAX_READ_REQ_SIZE); dw_pcie_setup_rc(pp); - dw_pcie_msi_init(pp); return 0; } diff --git a/drivers/pci/controller/dwc/pcie-artpec6.c b/drivers/pci/controller/dwc/pcie-artpec6.c index 8b3da3038ac3..7ee8f3c83f8f 100644 --- a/drivers/pci/controller/dwc/pcie-artpec6.c +++ b/drivers/pci/controller/dwc/pcie-artpec6.c @@ -329,7 +329,6 @@ static int artpec6_pcie_host_init(struct pcie_port *pp) artpec6_pcie_deassert_core_reset(artpec6_pcie); artpec6_pcie_wait_for_phy(artpec6_pcie); dw_pcie_setup_rc(pp); - dw_pcie_msi_init(pp); return 0; } diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c index f5f9d4e58aa3..025514e00a42 100644 --- a/drivers/pci/controller/dwc/pcie-designware-host.c +++ b/drivers/pci/controller/dwc/pcie-designware-host.c @@ -256,7 +256,7 @@ int dw_pcie_allocate_domains(struct pcie_port *pp) return 0; } -void dw_pcie_free_msi(struct pcie_port *pp) +static void dw_pcie_free_msi(struct pcie_port *pp) { if (pp->msi_irq) { irq_set_chained_handler(pp->msi_irq, NULL); @@ -275,12 +275,12 @@ void
[PATCH 10/13] PCI: dwc: Move link handling into common code
All the DWC drivers do link setup and checks at roughly the same time. Let's use the existing .start_link() hook (currently only used in EP mode) and move the link handling to the core code. The behavior for a link down was inconsistent as some drivers would fail probe in that case while others succeed. Let's standardize this to succeed as there are usecases where devices (and the link) appear later even without hotplug. For example, a reconfigured FPGA device. Cc: Kishon Vijay Abraham I Cc: Lorenzo Pieralisi Cc: Bjorn Helgaas Cc: Jingoo Han Cc: Kukjin Kim Cc: Krzysztof Kozlowski Cc: Richard Zhu Cc: Lucas Stach Cc: Shawn Guo Cc: Sascha Hauer Cc: Pengutronix Kernel Team Cc: Fabio Estevam Cc: NXP Linux Team Cc: Murali Karicheri Cc: Yue Wang Cc: Kevin Hilman Cc: Neil Armstrong Cc: Jerome Brunet Cc: Martin Blumenstingl Cc: Thomas Petazzoni Cc: Jesper Nilsson Cc: Gustavo Pimentel Cc: Xiaowei Song Cc: Binghui Wang Cc: Andy Gross Cc: Bjorn Andersson Cc: Stanimir Varbanov Cc: Pratyush Anand Cc: Thierry Reding Cc: Jonathan Hunter Cc: Kunihiko Hayashi Cc: Masahiro Yamada Cc: linux-o...@vger.kernel.org Cc: linux-samsung-...@vger.kernel.org Cc: linux-amlo...@lists.infradead.org Cc: linux-arm-ker...@axis.com Cc: linux-arm-...@vger.kernel.org Cc: linux-te...@vger.kernel.org Signed-off-by: Rob Herring --- drivers/pci/controller/dwc/pci-dra7xx.c | 2 - drivers/pci/controller/dwc/pci-exynos.c | 41 +++-- drivers/pci/controller/dwc/pci-imx6.c | 9 ++-- drivers/pci/controller/dwc/pci-keystone.c | 9 drivers/pci/controller/dwc/pci-meson.c| 24 -- drivers/pci/controller/dwc/pcie-armada8k.c| 39 +++- drivers/pci/controller/dwc/pcie-artpec6.c | 2 - .../pci/controller/dwc/pcie-designware-host.c | 9 .../pci/controller/dwc/pcie-designware-plat.c | 3 -- drivers/pci/controller/dwc/pcie-histb.c | 34 +++--- drivers/pci/controller/dwc/pcie-kirin.c | 23 ++ drivers/pci/controller/dwc/pcie-qcom.c| 19 ++-- drivers/pci/controller/dwc/pcie-spear13xx.c | 46 --- drivers/pci/controller/dwc/pcie-tegra194.c| 1 - drivers/pci/controller/dwc/pcie-uniphier.c| 13 ++ 15 files changed, 103 insertions(+), 171 deletions(-) diff --git a/drivers/pci/controller/dwc/pci-dra7xx.c b/drivers/pci/controller/dwc/pci-dra7xx.c index 6b75c68dddb5..054423d9646d 100644 --- a/drivers/pci/controller/dwc/pci-dra7xx.c +++ b/drivers/pci/controller/dwc/pci-dra7xx.c @@ -183,8 +183,6 @@ static int dra7xx_pcie_host_init(struct pcie_port *pp) dw_pcie_setup_rc(pp); - dra7xx_pcie_establish_link(pci); - dw_pcie_wait_for_link(pci); dw_pcie_msi_init(pp); dra7xx_pcie_enable_interrupts(dra7xx); diff --git a/drivers/pci/controller/dwc/pci-exynos.c b/drivers/pci/controller/dwc/pci-exynos.c index 7734394953e5..6498b615c834 100644 --- a/drivers/pci/controller/dwc/pci-exynos.c +++ b/drivers/pci/controller/dwc/pci-exynos.c @@ -229,30 +229,9 @@ static void exynos_pcie_assert_reset(struct exynos_pcie *ep) GPIOF_OUT_INIT_HIGH, "RESET"); } -static int exynos_pcie_establish_link(struct exynos_pcie *ep) +static int exynos_pcie_start_link(struct dw_pcie *pci) { - struct dw_pcie *pci = ep->pci; - struct pcie_port *pp = >pp; - struct device *dev = pci->dev; - - if (dw_pcie_link_up(pci)) { - dev_err(dev, "Link already up\n"); - return 0; - } - - exynos_pcie_assert_core_reset(ep); - - phy_reset(ep->phy); - - exynos_pcie_writel(ep->mem_res->elbi_base, 1, - PCIE_PWR_RESET); - - phy_power_on(ep->phy); - phy_init(ep->phy); - - exynos_pcie_deassert_core_reset(ep); - dw_pcie_setup_rc(pp); - exynos_pcie_assert_reset(ep); + struct exynos_pcie *ep = to_exynos_pcie(pci); /* assert LTSSM enable */ exynos_pcie_writel(ep->mem_res->elbi_base, PCIE_ELBI_LTSSM_ENABLE, @@ -386,7 +365,20 @@ static int exynos_pcie_host_init(struct pcie_port *pp) pp->bridge->ops = _pci_ops; - exynos_pcie_establish_link(ep); + exynos_pcie_assert_core_reset(ep); + + phy_reset(ep->phy); + + exynos_pcie_writel(ep->mem_res->elbi_base, 1, + PCIE_PWR_RESET); + + phy_power_on(ep->phy); + phy_init(ep->phy); + + exynos_pcie_deassert_core_reset(ep); + dw_pcie_setup_rc(pp); + exynos_pcie_assert_reset(ep); + exynos_pcie_enable_interrupts(ep); return 0; @@ -430,6 +422,7 @@ static const struct dw_pcie_ops dw_pcie_ops = { .read_dbi = exynos_pcie_read_dbi, .write_dbi = exynos_pcie_write_dbi, .link_up = exynos_pcie_link_up, + .start_link = exynos_pcie_start_link, }; static int __init exynos_pcie_probe(struct platform_device *pdev) diff --git a/drivers/pci/controller/dwc/pci-imx6.c
[PATCH 09/13] PCI: dwc: Rework MSI initialization
There are 3 possible MSI implementations for the DWC host. The first is using the built-in DWC MSI controller. The 2nd is a custom MSI controller as part of the PCI host (keystone only). The 3rd is an external MSI controller (typically GICv3 ITS). Currently, the last 2 are distinguished with a .msi_host_init() hook with the 3rd option using an empty function. However we can detect the 3rd case with the presence of 'msi-parent' or 'msi-map' properties, so let's do that instead and remove the empty functions. Cc: Murali Karicheri Cc: Lorenzo Pieralisi Cc: Bjorn Helgaas Cc: Minghuan Lian Cc: Mingkai Hu Cc: Roy Zang Cc: Jingoo Han Cc: Gustavo Pimentel Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Rob Herring --- drivers/pci/controller/dwc/pci-keystone.c | 9 --- drivers/pci/controller/dwc/pci-layerscape.c | 25 --- .../pci/controller/dwc/pcie-designware-host.c | 20 +-- drivers/pci/controller/dwc/pcie-designware.h | 1 + drivers/pci/controller/dwc/pcie-intel-gw.c| 9 --- 5 files changed, 13 insertions(+), 51 deletions(-) diff --git a/drivers/pci/controller/dwc/pci-keystone.c b/drivers/pci/controller/dwc/pci-keystone.c index 9cf14f13798b..784385ae6074 100644 --- a/drivers/pci/controller/dwc/pci-keystone.c +++ b/drivers/pci/controller/dwc/pci-keystone.c @@ -272,14 +272,6 @@ static void ks_pcie_handle_legacy_irq(struct keystone_pcie *ks_pcie, ks_pcie_app_writel(ks_pcie, IRQ_EOI, offset); } -/* - * Dummy function so that DW core doesn't configure MSI - */ -static int ks_pcie_am654_msi_host_init(struct pcie_port *pp) -{ - return 0; -} - static void ks_pcie_enable_error_irq(struct keystone_pcie *ks_pcie) { ks_pcie_app_writel(ks_pcie, ERR_IRQ_ENABLE_SET, ERR_IRQ_ALL); @@ -854,7 +846,6 @@ static const struct dw_pcie_host_ops ks_pcie_host_ops = { static const struct dw_pcie_host_ops ks_pcie_am654_host_ops = { .host_init = ks_pcie_host_init, - .msi_host_init = ks_pcie_am654_msi_host_init, }; static irqreturn_t ks_pcie_err_irq_handler(int irq, void *priv) diff --git a/drivers/pci/controller/dwc/pci-layerscape.c b/drivers/pci/controller/dwc/pci-layerscape.c index 53e56d54c482..0d84986c4c16 100644 --- a/drivers/pci/controller/dwc/pci-layerscape.c +++ b/drivers/pci/controller/dwc/pci-layerscape.c @@ -168,37 +168,12 @@ static int ls1021_pcie_host_init(struct pcie_port *pp) return ls_pcie_host_init(pp); } -static int ls_pcie_msi_host_init(struct pcie_port *pp) -{ - struct dw_pcie *pci = to_dw_pcie_from_pp(pp); - struct device *dev = pci->dev; - struct device_node *np = dev->of_node; - struct device_node *msi_node; - - /* -* The MSI domain is set by the generic of_msi_configure(). This -* .msi_host_init() function keeps us from doing the default MSI -* domain setup in dw_pcie_host_init() and also enforces the -* requirement that "msi-parent" exists. -*/ - msi_node = of_parse_phandle(np, "msi-parent", 0); - if (!msi_node) { - dev_err(dev, "failed to find msi-parent\n"); - return -EINVAL; - } - - of_node_put(msi_node); - return 0; -} - static const struct dw_pcie_host_ops ls1021_pcie_host_ops = { .host_init = ls1021_pcie_host_init, - .msi_host_init = ls_pcie_msi_host_init, }; static const struct dw_pcie_host_ops ls_pcie_host_ops = { .host_init = ls_pcie_host_init, - .msi_host_init = ls_pcie_msi_host_init, }; static const struct dw_pcie_ops dw_ls1021_pcie_ops = { diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c index 0f77e4d4b385..6cebdd9bbd2e 100644 --- a/drivers/pci/controller/dwc/pcie-designware-host.c +++ b/drivers/pci/controller/dwc/pcie-designware-host.c @@ -365,6 +365,10 @@ int dw_pcie_host_init(struct pcie_port *pp) pci->link_gen = of_pci_get_max_link_speed(np); if (pci_msi_enabled()) { + pp->has_msi_ctrl = !(pp->ops->msi_host_init || +of_property_read_bool(np, "msi-parent") || +of_property_read_bool(np, "msi-map")); + if (!pp->num_vectors) { pp->num_vectors = MSI_DEF_NUM_VECTORS; } else if (pp->num_vectors > MAX_MSI_IRQS) { @@ -372,7 +376,11 @@ int dw_pcie_host_init(struct pcie_port *pp) return -EINVAL; } - if (!pp->ops->msi_host_init) { + if (pp->ops->msi_host_init) { + ret = pp->ops->msi_host_init(pp); + if (ret < 0) + return ret; + } else if (pp->has_msi_ctrl) { if (!pp->msi_irq) { pp->msi_irq = platform_get_irq_byname(pdev, "msi"); if (pp->msi_irq < 0) {
[PATCH 08/13] PCI: dwc: Move MSI interrupt setup into DWC common code
Platforms using the built-in DWC MSI controller all have a dedicated interrupt with "msi" name or at index 0, so let's move setting up the interrupt to the common DWC code. spear13xx and dra7xx are the 2 oddballs with muxed interrupts, so we need to prevent configuring the MSI interrupt by setting msi_irq to negative. Cc: Jingoo Han Cc: Lorenzo Pieralisi Cc: Bjorn Helgaas Cc: Kukjin Kim Cc: Krzysztof Kozlowski Cc: Richard Zhu Cc: Lucas Stach Cc: Shawn Guo Cc: Sascha Hauer Cc: Pengutronix Kernel Team Cc: Fabio Estevam Cc: NXP Linux Team Cc: Yue Wang Cc: Kevin Hilman Cc: Neil Armstrong Cc: Jerome Brunet Cc: Martin Blumenstingl Cc: Jesper Nilsson Cc: Gustavo Pimentel Cc: Xiaowei Song Cc: Binghui Wang Cc: Stanimir Varbanov Cc: Andy Gross Cc: Bjorn Andersson Cc: Pratyush Anand Cc: Thierry Reding Cc: Jonathan Hunter Cc: Kunihiko Hayashi Cc: Masahiro Yamada Cc: linux-samsung-...@vger.kernel.org Cc: linux-amlo...@lists.infradead.org Cc: linux-arm-ker...@axis.com Cc: linux-arm-...@vger.kernel.org Cc: linux-te...@vger.kernel.org Signed-off-by: Rob Herring --- drivers/pci/controller/dwc/pci-dra7xx.c | 3 +++ drivers/pci/controller/dwc/pci-exynos.c | 6 - drivers/pci/controller/dwc/pci-imx6.c | 6 - drivers/pci/controller/dwc/pci-meson.c| 6 - drivers/pci/controller/dwc/pcie-artpec6.c | 6 - .../pci/controller/dwc/pcie-designware-host.c | 11 +- .../pci/controller/dwc/pcie-designware-plat.c | 6 - drivers/pci/controller/dwc/pcie-histb.c | 6 - drivers/pci/controller/dwc/pcie-kirin.c | 22 --- drivers/pci/controller/dwc/pcie-qcom.c| 8 --- drivers/pci/controller/dwc/pcie-spear13xx.c | 1 + drivers/pci/controller/dwc/pcie-tegra194.c| 8 --- drivers/pci/controller/dwc/pcie-uniphier.c| 6 - 13 files changed, 14 insertions(+), 81 deletions(-) diff --git a/drivers/pci/controller/dwc/pci-dra7xx.c b/drivers/pci/controller/dwc/pci-dra7xx.c index 4d0c35a4aa59..6b75c68dddb5 100644 --- a/drivers/pci/controller/dwc/pci-dra7xx.c +++ b/drivers/pci/controller/dwc/pci-dra7xx.c @@ -489,6 +489,9 @@ static int __init dra7xx_add_pcie_port(struct dra7xx_pcie *dra7xx, if (pp->irq < 0) return pp->irq; + /* MSI IRQ is muxed */ + pp->msi_irq = -ENODEV; + ret = dra7xx_pcie_init_irq_domain(pp); if (ret < 0) return ret; diff --git a/drivers/pci/controller/dwc/pci-exynos.c b/drivers/pci/controller/dwc/pci-exynos.c index 242683cde04a..7734394953e5 100644 --- a/drivers/pci/controller/dwc/pci-exynos.c +++ b/drivers/pci/controller/dwc/pci-exynos.c @@ -415,12 +415,6 @@ static int __init exynos_add_pcie_port(struct exynos_pcie *ep, return ret; } - if (IS_ENABLED(CONFIG_PCI_MSI)) { - pp->msi_irq = platform_get_irq(pdev, 0); - if (pp->msi_irq < 0) - return pp->msi_irq; - } - pp->ops = _pcie_host_ops; ret = dw_pcie_host_init(pp); diff --git a/drivers/pci/controller/dwc/pci-imx6.c b/drivers/pci/controller/dwc/pci-imx6.c index 7dd137d62dca..1b979c956fcd 100644 --- a/drivers/pci/controller/dwc/pci-imx6.c +++ b/drivers/pci/controller/dwc/pci-imx6.c @@ -853,12 +853,6 @@ static int imx6_add_pcie_port(struct imx6_pcie *imx6_pcie, struct device *dev = >dev; int ret; - if (IS_ENABLED(CONFIG_PCI_MSI)) { - pp->msi_irq = platform_get_irq_byname(pdev, "msi"); - if (pp->msi_irq < 0) - return pp->msi_irq; - } - pp->ops = _pcie_host_ops; ret = dw_pcie_host_init(pp); diff --git a/drivers/pci/controller/dwc/pci-meson.c b/drivers/pci/controller/dwc/pci-meson.c index 1913dc2c8fa0..10d65b3093e4 100644 --- a/drivers/pci/controller/dwc/pci-meson.c +++ b/drivers/pci/controller/dwc/pci-meson.c @@ -405,12 +405,6 @@ static int meson_add_pcie_port(struct meson_pcie *mp, struct device *dev = >dev; int ret; - if (IS_ENABLED(CONFIG_PCI_MSI)) { - pp->msi_irq = platform_get_irq(pdev, 0); - if (pp->msi_irq < 0) - return pp->msi_irq; - } - pp->ops = _pcie_host_ops; ret = dw_pcie_host_init(pp); diff --git a/drivers/pci/controller/dwc/pcie-artpec6.c b/drivers/pci/controller/dwc/pcie-artpec6.c index 52ad7909cd0c..a5239a58cee0 100644 --- a/drivers/pci/controller/dwc/pcie-artpec6.c +++ b/drivers/pci/controller/dwc/pcie-artpec6.c @@ -348,12 +348,6 @@ static int artpec6_add_pcie_port(struct artpec6_pcie *artpec6_pcie, struct device *dev = pci->dev; int ret; - if (IS_ENABLED(CONFIG_PCI_MSI)) { - pp->msi_irq = platform_get_irq_byname(pdev, "msi"); - if (pp->msi_irq < 0) - return pp->msi_irq; - } - pp->ops = _pcie_host_ops; ret = dw_pcie_host_init(pp); diff --git
[PATCH 07/13] PCI: dwc: Drop the .set_num_vectors() host op
There's no reason for the .set_num_vectors() host op. Drivers needing a non-default value can just initialize pcie_port.num_vectors directly. Cc: Jingoo Han Cc: Gustavo Pimentel Cc: Lorenzo Pieralisi Cc: Bjorn Helgaas Cc: Thierry Reding Cc: Jonathan Hunter Cc: linux-te...@vger.kernel.org Signed-off-by: Rob Herring --- .../pci/controller/dwc/pcie-designware-host.c | 19 --- .../pci/controller/dwc/pcie-designware-plat.c | 7 +-- drivers/pci/controller/dwc/pcie-designware.h | 1 - drivers/pci/controller/dwc/pcie-tegra194.c| 7 +-- 4 files changed, 6 insertions(+), 28 deletions(-) diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c index 265a48f1a0ae..1bd6a9762426 100644 --- a/drivers/pci/controller/dwc/pcie-designware-host.c +++ b/drivers/pci/controller/dwc/pcie-designware-host.c @@ -365,22 +365,11 @@ int dw_pcie_host_init(struct pcie_port *pp) pci->link_gen = of_pci_get_max_link_speed(np); if (pci_msi_enabled()) { - /* -* If a specific SoC driver needs to change the -* default number of vectors, it needs to implement -* the set_num_vectors callback. -*/ - if (!pp->ops->set_num_vectors) { + if (!pp->num_vectors) { pp->num_vectors = MSI_DEF_NUM_VECTORS; - } else { - pp->ops->set_num_vectors(pp); - - if (pp->num_vectors > MAX_MSI_IRQS || - pp->num_vectors == 0) { - dev_err(dev, - "Invalid number of vectors\n"); - return -EINVAL; - } + } else if (pp->num_vectors > MAX_MSI_IRQS) { + dev_err(dev, "Invalid number of vectors\n"); + return -EINVAL; } if (!pp->ops->msi_host_init) { diff --git a/drivers/pci/controller/dwc/pcie-designware-plat.c b/drivers/pci/controller/dwc/pcie-designware-plat.c index 562a05e07b1d..13fede1d4157 100644 --- a/drivers/pci/controller/dwc/pcie-designware-plat.c +++ b/drivers/pci/controller/dwc/pcie-designware-plat.c @@ -44,14 +44,8 @@ static int dw_plat_pcie_host_init(struct pcie_port *pp) return 0; } -static void dw_plat_set_num_vectors(struct pcie_port *pp) -{ - pp->num_vectors = MAX_MSI_IRQS; -} - static const struct dw_pcie_host_ops dw_plat_pcie_host_ops = { .host_init = dw_plat_pcie_host_init, - .set_num_vectors = dw_plat_set_num_vectors, }; static int dw_plat_pcie_establish_link(struct dw_pcie *pci) @@ -128,6 +122,7 @@ static int dw_plat_add_pcie_port(struct dw_plat_pcie *dw_plat_pcie, return pp->msi_irq; } + pp->num_vectors = MAX_MSI_IRQS; pp->ops = _plat_pcie_host_ops; ret = dw_pcie_host_init(pp); diff --git a/drivers/pci/controller/dwc/pcie-designware.h b/drivers/pci/controller/dwc/pcie-designware.h index ed19c34dd0fe..96382dcb2859 100644 --- a/drivers/pci/controller/dwc/pcie-designware.h +++ b/drivers/pci/controller/dwc/pcie-designware.h @@ -171,7 +171,6 @@ enum dw_pcie_device_mode { struct dw_pcie_host_ops { int (*host_init)(struct pcie_port *pp); - void (*set_num_vectors)(struct pcie_port *pp); int (*msi_host_init)(struct pcie_port *pp); }; diff --git a/drivers/pci/controller/dwc/pcie-tegra194.c b/drivers/pci/controller/dwc/pcie-tegra194.c index f8fca6794282..5e2841f58700 100644 --- a/drivers/pci/controller/dwc/pcie-tegra194.c +++ b/drivers/pci/controller/dwc/pcie-tegra194.c @@ -990,11 +990,6 @@ static int tegra_pcie_dw_link_up(struct dw_pcie *pci) return !!(val & PCI_EXP_LNKSTA_DLLLA); } -static void tegra_pcie_set_msi_vec_num(struct pcie_port *pp) -{ - pp->num_vectors = MAX_MSI_IRQS; -} - static int tegra_pcie_dw_start_link(struct dw_pcie *pci) { struct tegra_pcie_dw *pcie = to_tegra_pcie(pci); @@ -1019,7 +1014,6 @@ static const struct dw_pcie_ops tegra_dw_pcie_ops = { static struct dw_pcie_host_ops tegra_pcie_dw_host_ops = { .host_init = tegra_pcie_dw_host_init, - .set_num_vectors = tegra_pcie_set_msi_vec_num, }; static void tegra_pcie_disable_phy(struct tegra_pcie_dw *pcie) @@ -1995,6 +1989,7 @@ static int tegra_pcie_dw_probe(struct platform_device *pdev) pci->n_fts[1] = FTS_VAL; pp = >pp; + pp->num_vectors = MAX_MSI_IRQS; pcie->dev = >dev; pcie->mode = (enum dw_pcie_device_mode)data->mode; -- 2.25.1
[PATCH 06/13] PCI: dwc/dra7xx: Use the common MSI irq_chip
The dra7xx MSI irq_chip implementation is identical to the default DWC one. The only difference is the interrupt handler as the MSI interrupt is muxed with other interrupts, but that doesn't affect the irq_chip part of it. Cc: Kishon Vijay Abraham I Cc: Lorenzo Pieralisi Cc: Bjorn Helgaas Cc: linux-o...@vger.kernel.org Signed-off-by: Rob Herring --- drivers/pci/controller/dwc/pci-dra7xx.c | 125 1 file changed, 125 deletions(-) diff --git a/drivers/pci/controller/dwc/pci-dra7xx.c b/drivers/pci/controller/dwc/pci-dra7xx.c index a4aabc85dbb1..4d0c35a4aa59 100644 --- a/drivers/pci/controller/dwc/pci-dra7xx.c +++ b/drivers/pci/controller/dwc/pci-dra7xx.c @@ -377,133 +377,8 @@ static int dra7xx_pcie_init_irq_domain(struct pcie_port *pp) return 0; } -static void dra7xx_pcie_setup_msi_msg(struct irq_data *d, struct msi_msg *msg) -{ - struct pcie_port *pp = irq_data_get_irq_chip_data(d); - struct dw_pcie *pci = to_dw_pcie_from_pp(pp); - u64 msi_target; - - msi_target = (u64)pp->msi_data; - - msg->address_lo = lower_32_bits(msi_target); - msg->address_hi = upper_32_bits(msi_target); - - msg->data = d->hwirq; - - dev_dbg(pci->dev, "msi#%d address_hi %#x address_lo %#x\n", - (int)d->hwirq, msg->address_hi, msg->address_lo); -} - -static int dra7xx_pcie_msi_set_affinity(struct irq_data *d, - const struct cpumask *mask, - bool force) -{ - return -EINVAL; -} - -static void dra7xx_pcie_bottom_mask(struct irq_data *d) -{ - struct pcie_port *pp = irq_data_get_irq_chip_data(d); - struct dw_pcie *pci = to_dw_pcie_from_pp(pp); - unsigned int res, bit, ctrl; - unsigned long flags; - - raw_spin_lock_irqsave(>lock, flags); - - ctrl = d->hwirq / MAX_MSI_IRQS_PER_CTRL; - res = ctrl * MSI_REG_CTRL_BLOCK_SIZE; - bit = d->hwirq % MAX_MSI_IRQS_PER_CTRL; - - pp->irq_mask[ctrl] |= BIT(bit); - dw_pcie_writel_dbi(pci, PCIE_MSI_INTR0_MASK + res, - pp->irq_mask[ctrl]); - - raw_spin_unlock_irqrestore(>lock, flags); -} - -static void dra7xx_pcie_bottom_unmask(struct irq_data *d) -{ - struct pcie_port *pp = irq_data_get_irq_chip_data(d); - struct dw_pcie *pci = to_dw_pcie_from_pp(pp); - unsigned int res, bit, ctrl; - unsigned long flags; - - raw_spin_lock_irqsave(>lock, flags); - - ctrl = d->hwirq / MAX_MSI_IRQS_PER_CTRL; - res = ctrl * MSI_REG_CTRL_BLOCK_SIZE; - bit = d->hwirq % MAX_MSI_IRQS_PER_CTRL; - - pp->irq_mask[ctrl] &= ~BIT(bit); - dw_pcie_writel_dbi(pci, PCIE_MSI_INTR0_MASK + res, - pp->irq_mask[ctrl]); - - raw_spin_unlock_irqrestore(>lock, flags); -} - -static void dra7xx_pcie_bottom_ack(struct irq_data *d) -{ - struct pcie_port *pp = irq_data_get_irq_chip_data(d); - struct dw_pcie *pci = to_dw_pcie_from_pp(pp); - unsigned int res, bit, ctrl; - - ctrl = d->hwirq / MAX_MSI_IRQS_PER_CTRL; - res = ctrl * MSI_REG_CTRL_BLOCK_SIZE; - bit = d->hwirq % MAX_MSI_IRQS_PER_CTRL; - - dw_pcie_writel_dbi(pci, PCIE_MSI_INTR0_STATUS + res, BIT(bit)); -} - -static struct irq_chip dra7xx_pci_msi_bottom_irq_chip = { - .name = "DRA7XX-PCI-MSI", - .irq_ack = dra7xx_pcie_bottom_ack, - .irq_compose_msi_msg = dra7xx_pcie_setup_msi_msg, - .irq_set_affinity = dra7xx_pcie_msi_set_affinity, - .irq_mask = dra7xx_pcie_bottom_mask, - .irq_unmask = dra7xx_pcie_bottom_unmask, -}; - -static int dra7xx_pcie_msi_host_init(struct pcie_port *pp) -{ - struct dw_pcie *pci = to_dw_pcie_from_pp(pp); - struct device *dev = pci->dev; - u32 ctrl, num_ctrls; - int ret; - - pp->msi_irq_chip = _pci_msi_bottom_irq_chip; - - num_ctrls = pp->num_vectors / MAX_MSI_IRQS_PER_CTRL; - /* Initialize IRQ Status array */ - for (ctrl = 0; ctrl < num_ctrls; ctrl++) { - pp->irq_mask[ctrl] = ~0; - dw_pcie_writel_dbi(pci, PCIE_MSI_INTR0_MASK + - (ctrl * MSI_REG_CTRL_BLOCK_SIZE), - pp->irq_mask[ctrl]); - dw_pcie_writel_dbi(pci, PCIE_MSI_INTR0_ENABLE + - (ctrl * MSI_REG_CTRL_BLOCK_SIZE), - ~0); - } - - ret = dw_pcie_allocate_domains(pp); - if (ret) - return ret; - - pp->msi_data = dma_map_single_attrs(dev, >msi_msg, - sizeof(pp->msi_msg), - DMA_FROM_DEVICE, - DMA_ATTR_SKIP_CPU_SYNC); - ret = dma_mapping_error(dev, pp->msi_data); - if (ret) { - dev_err(dev, "Failed to map MSI data\n"); - pp->msi_data = 0; -
[PATCH 05/13] PCI: dwc: Ensure all outbound ATU windows are reset
The Layerscape driver clears the ATU registers which may have been configured by the bootloader. Any driver could have the same issue and doing it for all drivers doesn't hurt, so let's move it into the common DWC code. Cc: Minghuan Lian Cc: Mingkai Hu Cc: Roy Zang Cc: Lorenzo Pieralisi Cc: Bjorn Helgaas Cc: Jingoo Han Cc: Gustavo Pimentel Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Rob Herring --- drivers/pci/controller/dwc/pci-layerscape.c | 14 -- drivers/pci/controller/dwc/pcie-designware-host.c | 5 + 2 files changed, 5 insertions(+), 14 deletions(-) diff --git a/drivers/pci/controller/dwc/pci-layerscape.c b/drivers/pci/controller/dwc/pci-layerscape.c index f24f79a70d9a..53e56d54c482 100644 --- a/drivers/pci/controller/dwc/pci-layerscape.c +++ b/drivers/pci/controller/dwc/pci-layerscape.c @@ -83,14 +83,6 @@ static void ls_pcie_drop_msg_tlp(struct ls_pcie *pcie) iowrite32(val, pci->dbi_base + PCIE_STRFMR1); } -static void ls_pcie_disable_outbound_atus(struct ls_pcie *pcie) -{ - int i; - - for (i = 0; i < PCIE_IATU_NUM; i++) - dw_pcie_disable_atu(pcie->pci, i, DW_PCIE_REGION_OUTBOUND); -} - static int ls1021_pcie_link_up(struct dw_pcie *pci) { u32 state; @@ -136,12 +128,6 @@ static int ls_pcie_host_init(struct pcie_port *pp) struct dw_pcie *pci = to_dw_pcie_from_pp(pp); struct ls_pcie *pcie = to_ls_pcie(pci); - /* -* Disable outbound windows configured by the bootloader to avoid -* one transaction hitting multiple outbound windows. -* dw_pcie_setup_rc() will reconfigure the outbound windows. -*/ - ls_pcie_disable_outbound_atus(pcie); ls_pcie_fix_error_response(pcie); dw_pcie_dbi_ro_wr_en(pci); diff --git a/drivers/pci/controller/dwc/pcie-designware-host.c b/drivers/pci/controller/dwc/pcie-designware-host.c index cde45b2076ee..265a48f1a0ae 100644 --- a/drivers/pci/controller/dwc/pcie-designware-host.c +++ b/drivers/pci/controller/dwc/pcie-designware-host.c @@ -534,6 +534,7 @@ static struct pci_ops dw_pcie_ops = { void dw_pcie_setup_rc(struct pcie_port *pp) { + int i; u32 val, ctrl, num_ctrls; struct dw_pcie *pci = to_dw_pcie_from_pp(pp); @@ -583,6 +584,10 @@ void dw_pcie_setup_rc(struct pcie_port *pp) PCI_COMMAND_MASTER | PCI_COMMAND_SERR; dw_pcie_writel_dbi(pci, PCI_COMMAND, val); + /* Ensure all outbound windows are disabled so there are multiple matches */ + for (i = 0; i < pci->num_viewport; i++) + dw_pcie_disable_atu(pci, i, DW_PCIE_REGION_OUTBOUND); + /* * If the platform provides its own child bus config accesses, it means * the platform uses its own address translation component rather than -- 2.25.1
[PATCH 04/13] PCI: dwc/intel-gw: Remove some unneeded function wrappers
Remove some of the pointless levels of functions that just wrap or group a series of other functions. Cc: Lorenzo Pieralisi Cc: Bjorn Helgaas Signed-off-by: Rob Herring --- drivers/pci/controller/dwc/pcie-intel-gw.c | 47 -- 1 file changed, 16 insertions(+), 31 deletions(-) diff --git a/drivers/pci/controller/dwc/pcie-intel-gw.c b/drivers/pci/controller/dwc/pcie-intel-gw.c index 88782653ed21..c562eb7454b1 100644 --- a/drivers/pci/controller/dwc/pcie-intel-gw.c +++ b/drivers/pci/controller/dwc/pcie-intel-gw.c @@ -152,19 +152,6 @@ static void intel_pcie_init_n_fts(struct dw_pcie *pci) pci->n_fts[0] = PORT_AFR_N_FTS_GEN12_DFT; } -static void intel_pcie_rc_setup(struct intel_pcie_port *lpp) -{ - struct dw_pcie *pci = >pci; - - pci->atu_base = pci->dbi_base + 0xC; - - intel_pcie_ltssm_disable(lpp); - intel_pcie_link_setup(lpp); - intel_pcie_init_n_fts(pci); - dw_pcie_setup_rc(>pp); - dw_pcie_upconfig_setup(pci); -} - static int intel_pcie_ep_rst_init(struct intel_pcie_port *lpp) { struct device *dev = lpp->pci.dev; @@ -216,14 +203,6 @@ static void intel_pcie_device_rst_deassert(struct intel_pcie_port *lpp) gpiod_set_value_cansleep(lpp->reset_gpio, 0); } -static int intel_pcie_app_logic_setup(struct intel_pcie_port *lpp) -{ - intel_pcie_device_rst_deassert(lpp); - intel_pcie_ltssm_enable(lpp); - - return dw_pcie_wait_for_link(>pci); -} - static void intel_pcie_core_irq_disable(struct intel_pcie_port *lpp) { pcie_app_wr(lpp, PCIE_APP_IRNEN, 0); @@ -273,11 +252,6 @@ static int intel_pcie_get_resources(struct platform_device *pdev) return 0; } -static void intel_pcie_deinit_phy(struct intel_pcie_port *lpp) -{ - phy_exit(lpp->phy); -} - static int intel_pcie_wait_l2(struct intel_pcie_port *lpp) { u32 value; @@ -314,6 +288,7 @@ static void intel_pcie_turn_off(struct intel_pcie_port *lpp) static int intel_pcie_host_setup(struct intel_pcie_port *lpp) { int ret; + struct dw_pcie *pci = >pci; intel_pcie_core_rst_assert(lpp); intel_pcie_device_rst_assert(lpp); @@ -330,8 +305,18 @@ static int intel_pcie_host_setup(struct intel_pcie_port *lpp) goto clk_err; } - intel_pcie_rc_setup(lpp); - ret = intel_pcie_app_logic_setup(lpp); + pci->atu_base = pci->dbi_base + 0xC; + + intel_pcie_ltssm_disable(lpp); + intel_pcie_link_setup(lpp); + intel_pcie_init_n_fts(pci); + dw_pcie_setup_rc(>pp); + dw_pcie_upconfig_setup(pci); + + intel_pcie_device_rst_deassert(lpp); + intel_pcie_ltssm_enable(lpp); + + ret = dw_pcie_wait_for_link(pci); if (ret) goto app_init_err; @@ -345,7 +330,7 @@ static int intel_pcie_host_setup(struct intel_pcie_port *lpp) clk_disable_unprepare(lpp->core_clk); clk_err: intel_pcie_core_rst_assert(lpp); - intel_pcie_deinit_phy(lpp); + phy_exit(lpp->phy); return ret; } @@ -356,7 +341,7 @@ static void __intel_pcie_remove(struct intel_pcie_port *lpp) intel_pcie_turn_off(lpp); clk_disable_unprepare(lpp->core_clk); intel_pcie_core_rst_assert(lpp); - intel_pcie_deinit_phy(lpp); + phy_exit(lpp->phy); } static int intel_pcie_remove(struct platform_device *pdev) @@ -380,7 +365,7 @@ static int __maybe_unused intel_pcie_suspend_noirq(struct device *dev) if (ret) return ret; - intel_pcie_deinit_phy(lpp); + phy_exit(lpp->phy); clk_disable_unprepare(lpp->core_clk); return ret; } -- 2.25.1
[PATCH 03/13] PCI: dwc: Move "dbi", "dbi2", and "addr_space" resource setup into common code
Most DWC drivers use the common register resource names "dbi", "dbi2", and "addr_space", so let's move their setup into the DWC common code. This means 'dbi_base' in particular is setup later, but it looks like no drivers touch DBI registers before dw_pcie_host_init or dw_pcie_ep_init. Cc: Kishon Vijay Abraham I Cc: Lorenzo Pieralisi Cc: Bjorn Helgaas Cc: Murali Karicheri Cc: Minghuan Lian Cc: Mingkai Hu Cc: Roy Zang Cc: Jonathan Chocron Cc: Jesper Nilsson Cc: Jingoo Han Cc: Gustavo Pimentel Cc: Xiaowei Song Cc: Binghui Wang Cc: Andy Gross Cc: Bjorn Andersson Cc: Stanimir Varbanov Cc: Pratyush Anand Cc: Thierry Reding Cc: Jonathan Hunter Cc: Kunihiko Hayashi Cc: Masahiro Yamada Cc: linux-o...@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-arm-ker...@axis.com Cc: linux-arm-...@vger.kernel.org Cc: linux-te...@vger.kernel.org Signed-off-by: Rob Herring --- drivers/pci/controller/dwc/pci-dra7xx.c | 8 drivers/pci/controller/dwc/pci-keystone.c | 29 +--- .../pci/controller/dwc/pci-layerscape-ep.c| 37 +-- drivers/pci/controller/dwc/pcie-al.c | 9 +--- drivers/pci/controller/dwc/pcie-artpec6.c | 43 ++ .../pci/controller/dwc/pcie-designware-ep.c | 29 ++-- .../pci/controller/dwc/pcie-designware-host.c | 7 +++ .../pci/controller/dwc/pcie-designware-plat.c | 45 +-- drivers/pci/controller/dwc/pcie-intel-gw.c| 4 -- drivers/pci/controller/dwc/pcie-kirin.c | 5 --- drivers/pci/controller/dwc/pcie-qcom.c| 8 drivers/pci/controller/dwc/pcie-spear13xx.c | 11 + drivers/pci/controller/dwc/pcie-tegra194.c| 22 - drivers/pci/controller/dwc/pcie-uniphier-ep.c | 38 +--- drivers/pci/controller/dwc/pcie-uniphier.c| 6 --- 15 files changed, 47 insertions(+), 254 deletions(-) diff --git a/drivers/pci/controller/dwc/pci-dra7xx.c b/drivers/pci/controller/dwc/pci-dra7xx.c index 6d012d2b1e90..a4aabc85dbb1 100644 --- a/drivers/pci/controller/dwc/pci-dra7xx.c +++ b/drivers/pci/controller/dwc/pci-dra7xx.c @@ -578,7 +578,6 @@ static int __init dra7xx_add_pcie_ep(struct dra7xx_pcie *dra7xx, { int ret; struct dw_pcie_ep *ep; - struct resource *res; struct device *dev = >dev; struct dw_pcie *pci = dra7xx->pci; @@ -594,13 +593,6 @@ static int __init dra7xx_add_pcie_ep(struct dra7xx_pcie *dra7xx, if (IS_ERR(pci->dbi_base2)) return PTR_ERR(pci->dbi_base2); - res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "addr_space"); - if (!res) - return -EINVAL; - - ep->phys_base = res->start; - ep->addr_size = resource_size(res); - ret = dw_pcie_ep_init(ep); if (ret) { dev_err(dev, "failed to initialize endpoint\n"); diff --git a/drivers/pci/controller/dwc/pci-keystone.c b/drivers/pci/controller/dwc/pci-keystone.c index a222728238ca..9cf14f13798b 100644 --- a/drivers/pci/controller/dwc/pci-keystone.c +++ b/drivers/pci/controller/dwc/pci-keystone.c @@ -977,33 +977,6 @@ static const struct dw_pcie_ep_ops ks_pcie_am654_ep_ops = { .get_features = _pcie_am654_get_features, }; -static int __init ks_pcie_add_pcie_ep(struct keystone_pcie *ks_pcie, - struct platform_device *pdev) -{ - int ret; - struct dw_pcie_ep *ep; - struct resource *res; - struct device *dev = >dev; - struct dw_pcie *pci = ks_pcie->pci; - - ep = >ep; - - res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "addr_space"); - if (!res) - return -EINVAL; - - ep->phys_base = res->start; - ep->addr_size = resource_size(res); - - ret = dw_pcie_ep_init(ep); - if (ret) { - dev_err(dev, "failed to initialize endpoint\n"); - return ret; - } - - return 0; -} - static void ks_pcie_disable_phy(struct keystone_pcie *ks_pcie) { int num_lanes = ks_pcie->num_lanes; @@ -1313,7 +1286,7 @@ static int __init ks_pcie_probe(struct platform_device *pdev) } pci->ep.ops = ep_ops; - ret = ks_pcie_add_pcie_ep(ks_pcie, pdev); + ret = dw_pcie_ep_init(>ep); if (ret < 0) goto err_get_sync; break; diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c b/drivers/pci/controller/dwc/pci-layerscape-ep.c index 84206f265e54..4af031b3f0a0 100644 --- a/drivers/pci/controller/dwc/pci-layerscape-ep.c +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c @@ -18,8 +18,6 @@ #include "pcie-designware.h" -#define PCIE_DBI2_OFFSET 0x1000 /* DBI2 base address*/ - #define to_ls_pcie_ep(x) dev_get_drvdata((x)->dev) struct ls_pcie_ep_drvdata { @@ -124,34 +122,6 @@ static const struct of_device_id ls_pcie_ep_of_match[] = { { }, }; -static int __init
[PATCH 02/13] PCI: dwc/intel-gw: Move ATU offset out of driver match data
The ATU offset should be a register range in DT called 'atu', not driver match data. Any future platforms with a different ATU offset should add it to their DT. This is also in preparation to do DBI resource setup in the core DWC code, so let's move setting atu_base later in intel_pcie_rc_setup(). Cc: Lorenzo Pieralisi Cc: Bjorn Helgaas Signed-off-by: Rob Herring --- drivers/pci/controller/dwc/pcie-intel-gw.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/drivers/pci/controller/dwc/pcie-intel-gw.c b/drivers/pci/controller/dwc/pcie-intel-gw.c index 5650cb78acba..77ef88333115 100644 --- a/drivers/pci/controller/dwc/pcie-intel-gw.c +++ b/drivers/pci/controller/dwc/pcie-intel-gw.c @@ -58,7 +58,6 @@ struct intel_pcie_soc { unsigned intpcie_ver; - unsigned intpcie_atu_offset; u32 num_viewport; }; @@ -155,11 +154,15 @@ static void intel_pcie_init_n_fts(struct dw_pcie *pci) static void intel_pcie_rc_setup(struct intel_pcie_port *lpp) { + struct dw_pcie *pci = >pci; + + pci->atu_base = pci->dbi_base + 0xC; + intel_pcie_ltssm_disable(lpp); intel_pcie_link_setup(lpp); - intel_pcie_init_n_fts(>pci); - dw_pcie_setup_rc(>pci.pp); - dw_pcie_upconfig_setup(>pci); + intel_pcie_init_n_fts(pci); + dw_pcie_setup_rc(>pp); + dw_pcie_upconfig_setup(pci); } static int intel_pcie_ep_rst_init(struct intel_pcie_port *lpp) @@ -425,7 +428,6 @@ static const struct dw_pcie_host_ops intel_pcie_dw_ops = { static const struct intel_pcie_soc pcie_data = { .pcie_ver = 0x520A, - .pcie_atu_offset = 0xC, .num_viewport = 3, }; @@ -461,7 +463,6 @@ static int intel_pcie_probe(struct platform_device *pdev) pci->ops = _pcie_ops; pci->version = data->pcie_ver; - pci->atu_base = pci->dbi_base + data->pcie_atu_offset; pp->ops = _pcie_dw_ops; ret = dw_pcie_host_init(pp); -- 2.25.1
[PATCH 01/13] PCI: dwc/imx6: Drop setting PCI_MSI_FLAGS_ENABLE
No other host driver sets the PCI_MSI_FLAGS_ENABLE bit, so it must not be necessary. If it is, a comment is needed. Cc: Richard Zhu Cc: Lucas Stach Cc: Lorenzo Pieralisi Cc: Bjorn Helgaas Cc: Shawn Guo Cc: Sascha Hauer Cc: Pengutronix Kernel Team Cc: Fabio Estevam Cc: NXP Linux Team Signed-off-by: Rob Herring --- drivers/pci/controller/dwc/pci-imx6.c | 8 1 file changed, 8 deletions(-) diff --git a/drivers/pci/controller/dwc/pci-imx6.c b/drivers/pci/controller/dwc/pci-imx6.c index 5cf1ef12fb9b..7dd137d62dca 100644 --- a/drivers/pci/controller/dwc/pci-imx6.c +++ b/drivers/pci/controller/dwc/pci-imx6.c @@ -1002,7 +1002,6 @@ static int imx6_pcie_probe(struct platform_device *pdev) struct resource *dbi_base; struct device_node *node = dev->of_node; int ret; - u16 val; imx6_pcie = devm_kzalloc(dev, sizeof(*imx6_pcie), GFP_KERNEL); if (!imx6_pcie) @@ -1167,13 +1166,6 @@ static int imx6_pcie_probe(struct platform_device *pdev) if (ret < 0) return ret; - if (pci_msi_enabled()) { - u8 offset = dw_pcie_find_capability(pci, PCI_CAP_ID_MSI); - val = dw_pcie_readw_dbi(pci, offset + PCI_MSI_FLAGS); - val |= PCI_MSI_FLAGS_ENABLE; - dw_pcie_writew_dbi(pci, offset + PCI_MSI_FLAGS, val); - } - return 0; } -- 2.25.1
[PATCH 00/13] PCI: dwc: Another round of clean-ups
Here's another batch of DWC PCI host refactoring. This series primarily moves more of the MSI, link up, and resource handling to the core code. No doubt I've probably broken something. Please test. A git branch is here[1]. Rob [1] git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git pci-more-dwc-cleanup Rob Herring (13): PCI: dwc/imx6: Drop setting PCI_MSI_FLAGS_ENABLE PCI: dwc/intel-gw: Move ATU offset out of driver match data PCI: dwc: Move "dbi", "dbi2", and "addr_space" resource setup into common code PCI: dwc/intel-gw: Remove some unneeded function wrappers PCI: dwc: Ensure all outbound ATU windows are reset PCI: dwc/dra7xx: Use the common MSI irq_chip PCI: dwc: Drop the .set_num_vectors() host op PCI: dwc: Move MSI interrupt setup into DWC common code PCI: dwc: Rework MSI initialization PCI: dwc: Move link handling into common code PCI: dwc: Move dw_pcie_msi_init() into core PCI: dwc: Move dw_pcie_setup_rc() to DWC common code PCI: dwc: Remove unnecessary wrappers around dw_pcie_host_init() drivers/pci/controller/dwc/pci-dra7xx.c | 141 +- drivers/pci/controller/dwc/pci-exynos.c | 50 ++- drivers/pci/controller/dwc/pci-imx6.c | 51 +-- drivers/pci/controller/dwc/pci-keystone.c | 68 + .../pci/controller/dwc/pci-layerscape-ep.c| 37 + drivers/pci/controller/dwc/pci-layerscape.c | 67 + drivers/pci/controller/dwc/pci-meson.c| 53 ++- drivers/pci/controller/dwc/pcie-al.c | 29 +--- drivers/pci/controller/dwc/pcie-armada8k.c| 37 ++--- drivers/pci/controller/dwc/pcie-artpec6.c | 76 +- .../pci/controller/dwc/pcie-designware-ep.c | 29 +++- .../pci/controller/dwc/pcie-designware-host.c | 80 ++ .../pci/controller/dwc/pcie-designware-plat.c | 70 + drivers/pci/controller/dwc/pcie-designware.h | 12 +- drivers/pci/controller/dwc/pcie-histb.c | 37 ++--- drivers/pci/controller/dwc/pcie-intel-gw.c| 59 ++-- drivers/pci/controller/dwc/pcie-kirin.c | 62 +--- drivers/pci/controller/dwc/pcie-qcom.c| 38 + drivers/pci/controller/dwc/pcie-spear13xx.c | 62 +++- drivers/pci/controller/dwc/pcie-tegra194.c| 40 + drivers/pci/controller/dwc/pcie-uniphier-ep.c | 38 + drivers/pci/controller/dwc/pcie-uniphier.c| 51 +-- 22 files changed, 217 insertions(+), 970 deletions(-) -- 2.25.1
Re: [PATCH 0/4] arch, mm: improve robustness of direct map manipulation
On Wed, 2020-10-28 at 13:09 +0200, Mike Rapoport wrote: > On Tue, Oct 27, 2020 at 09:46:35AM +0100, David Hildenbrand wrote: > > On 27.10.20 09:38, Mike Rapoport wrote: > > > On Mon, Oct 26, 2020 at 06:05:30PM +, Edgecombe, Rick P > > > wrote: > > > > > > > Beyond whatever you are seeing, for the latter case of new > > > > things > > > > getting introduced to an interface with hidden dependencies... > > > > Another > > > > edge case could be a new caller to set_memory_np() could result > > > > in > > > > large NP pages. None of the callers today should cause this > > > > AFAICT, but > > > > it's not great to rely on the callers to know these details. > > > A caller of set_memory_*() or set_direct_map_*() should expect a > > > failure > > > and be ready for that. So adding a WARN to safe_copy_page() is > > > the first > > > step in that direction :) > > > > > > > I am probably missing something important, but why are we > > saving/restoring > > the content of pages that were explicitly removed from the identity > > mapping > > such that nobody will access them? > > Actually, we should not be saving/restoring free pages during > hibernation as there are several calls to mark_free_pages() that > should > exclude the free pages from the snapshot. I've tried to find why the > fix > that maps/unmaps a page to save it was required at the first place, > but > I could not find bug reports. > > The closest I've got is an email from Rafael that asked to update > "hibernate: handle DEBUG_PAGEALLOC" patch: > > https://lore.kernel.org/linux-pm/200802200133.44098@sisk.pl/ > > Could it be that safe_copy_page() tries to workaround a non-existent > problem? It looks like inside page_alloc.c it unmaps the page before it actually frees it, so to hibernate it could look like the page is still allocated even though it's unmapped? Maybe that small window is what it cared about initially. There is also now the vmalloc case, which I am actually working on expanding. So I think the re-mapping logic is needed.
[PATCH] powerpc/smp: Move rcu_cpu_starting() earlier
The call to rcu_cpu_starting() in start_secondary() is not early enough in the CPU-hotplug onlining process, which results in lockdep splats as follows: WARNING: suspicious RCU usage - kernel/locking/lockdep.c:3497 RCU-list traversed in non-reader section!! other info that might help us debug this: RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1 no locks held by swapper/1/0. Call Trace: dump_stack+0xec/0x144 (unreliable) lockdep_rcu_suspicious+0x128/0x14c __lock_acquire+0x1060/0x1c60 lock_acquire+0x140/0x5f0 _raw_spin_lock_irqsave+0x64/0xb0 clockevents_register_device+0x74/0x270 register_decrementer_clockevent+0x94/0x110 start_secondary+0x134/0x800 start_secondary_prolog+0x10/0x14 This is avoided by moving the call to rcu_cpu_starting up near the beginning of the start_secondary() function. Note that the raw_smp_processor_id() is required in order to avoid calling into lockdep before RCU has declared the CPU to be watched for readers. Link: https://lore.kernel.org/lkml/160223032121.7002.1269740091547117869.tip-bot2@tip-bot2/ Signed-off-by: Qian Cai --- arch/powerpc/kernel/smp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c index 3c6b9822f978..8c2857cbd960 100644 --- a/arch/powerpc/kernel/smp.c +++ b/arch/powerpc/kernel/smp.c @@ -1393,13 +1393,14 @@ static void add_cpu_to_masks(int cpu) /* Activate a secondary processor. */ void start_secondary(void *unused) { - unsigned int cpu = smp_processor_id(); + unsigned int cpu = raw_smp_processor_id(); mmgrab(_mm); current->active_mm = _mm; smp_store_cpu_info(cpu); set_dec(tb_ticks_per_jiffy); + rcu_cpu_starting(cpu); preempt_disable(); cpu_callin_map[cpu] = 1; -- 2.28.0
Re: [PATCH kernel v3 1/2] dma: Allow mixing bypass and mapped DMA operation
On Wed, Oct 28, 2020 at 06:00:29PM +1100, Alexey Kardashevskiy wrote: > At the moment we allow bypassing DMA ops only when we can do this for > the entire RAM. However there are configs with mixed type memory > where we could still allow bypassing IOMMU in most cases; > POWERPC with persistent memory is one example. > > This adds an arch hook to determine where bypass can still work and > we invoke direct DMA API. The following patch checks the bus limit > on POWERPC to allow or disallow direct mapping. > > This adds a CONFIG_ARCH_HAS_DMA_SET_MASK config option to make arch_ > hooks no-op by default. > > Signed-off-by: Alexey Kardashevskiy > --- > kernel/dma/mapping.c | 24 > kernel/dma/Kconfig | 4 > 2 files changed, 24 insertions(+), 4 deletions(-) > > diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c > index 51bb8fa8eb89..a0bc9eb876ed 100644 > --- a/kernel/dma/mapping.c > +++ b/kernel/dma/mapping.c > @@ -137,6 +137,18 @@ static inline bool dma_map_direct(struct device *dev, > return dma_go_direct(dev, *dev->dma_mask, ops); > } > > +#ifdef CONFIG_ARCH_HAS_DMA_MAP_DIRECT > +bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr); > +bool arch_dma_unmap_page_direct(struct device *dev, dma_addr_t dma_handle); > +bool arch_dma_map_sg_direct(struct device *dev, struct scatterlist *sg, int > nents); > +bool arch_dma_unmap_sg_direct(struct device *dev, struct scatterlist *sg, > int nents); > +#else > +#define arch_dma_map_page_direct(d, a) (0) > +#define arch_dma_unmap_page_direct(d, a) (0) > +#define arch_dma_map_sg_direct(d, s, n) (0) > +#define arch_dma_unmap_sg_direct(d, s, n) (0) > +#endif A bunch of overly long lines here. Except for that this looks ok to me. If you want me to queue up the series I can just fix it up.
Re: [PATCH kernel v2 1/2] dma: Allow mixing bypass and normal IOMMU operation
On Wed, Oct 28, 2020 at 05:55:23PM +1100, Alexey Kardashevskiy wrote: > > It is passing an address of the end of the mapped area so passing a page > struct means passing page and offset which is an extra parameter and we do > not want to do anything with the page in those hooks anyway so I'd keep it > as is. > > >> and >> maybe even hide the dma_map_direct inside it. > > Call dma_map_direct() from arch_dma_map_page_direct() if > arch_dma_map_page_direct() is defined? Seems suboptimal as it is going to > be bypass=true in most cases and we save one call by avoiding calling > arch_dma_map_page_direct(). Unless I missed something? C does not even evaluate the right hand side of a || expression if the left hand evaluates to true.
[PATCH] powerpc/eeh_cache: Fix a possible debugfs deadlock
Lockdep complains that a possible deadlock below in eeh_addr_cache_show() because it is acquiring a lock with IRQ enabled, but eeh_addr_cache_insert_dev() needs to acquire the same lock with IRQ disabled. Let's just make eeh_addr_cache_show() acquire the lock with IRQ disabled as well. CPU0CPU1 lock(_io_addr_cache_root.piar_lock); local_irq_disable(); lock(>lock); lock(_io_addr_cache_root.piar_lock); lock(>lock); *** DEADLOCK *** lock_acquire+0x140/0x5f0 _raw_spin_lock_irqsave+0x64/0xb0 eeh_addr_cache_insert_dev+0x48/0x390 eeh_probe_device+0xb8/0x1a0 pnv_pcibios_bus_add_device+0x3c/0x80 pcibios_bus_add_device+0x118/0x290 pci_bus_add_device+0x28/0xe0 pci_bus_add_devices+0x54/0xb0 pcibios_init+0xc4/0x124 do_one_initcall+0xac/0x528 kernel_init_freeable+0x35c/0x3fc kernel_init+0x24/0x148 ret_from_kernel_thread+0x5c/0x80 lock_acquire+0x140/0x5f0 _raw_spin_lock+0x4c/0x70 eeh_addr_cache_show+0x38/0x110 seq_read+0x1a0/0x660 vfs_read+0xc8/0x1f0 ksys_read+0x74/0x130 system_call_exception+0xf8/0x1d0 system_call_common+0xe8/0x218 Fixes: 5ca85ae6318d ("powerpc/eeh_cache: Add a way to dump the EEH address cache") Signed-off-by: Qian Cai --- arch/powerpc/kernel/eeh_cache.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/eeh_cache.c b/arch/powerpc/kernel/eeh_cache.c index 6b50bf15d8c1..bf3270426d82 100644 --- a/arch/powerpc/kernel/eeh_cache.c +++ b/arch/powerpc/kernel/eeh_cache.c @@ -264,8 +264,9 @@ static int eeh_addr_cache_show(struct seq_file *s, void *v) { struct pci_io_addr_range *piar; struct rb_node *n; + unsigned long flags; - spin_lock(_io_addr_cache_root.piar_lock); + spin_lock_irqsave(_io_addr_cache_root.piar_lock, flags); for (n = rb_first(_io_addr_cache_root.rb_root); n; n = rb_next(n)) { piar = rb_entry(n, struct pci_io_addr_range, rb_node); @@ -273,7 +274,7 @@ static int eeh_addr_cache_show(struct seq_file *s, void *v) (piar->flags & IORESOURCE_IO) ? "i/o" : "mem", >addr_lo, >addr_hi, pci_name(piar->pcidev)); } - spin_unlock(_io_addr_cache_root.piar_lock); + spin_unlock_irqrestore(_io_addr_cache_root.piar_lock, flags); return 0; } -- 2.28.0
[PATCH 19/33] docs: ABI: stable: make files ReST compatible
From: Mauro Carvalho Chehab Several entries at the stable ABI files won't parse if we pass them directly to the ReST output. Adjust them, in order to allow adding their contents as-is at the stable ABI book. Signed-off-by: Mauro Carvalho Chehab Signed-off-by: Mauro Carvalho Chehab --- Documentation/ABI/stable/firewire-cdev| 4 + Documentation/ABI/stable/sysfs-acpi-pmprofile | 22 +++-- Documentation/ABI/stable/sysfs-bus-firewire | 3 + Documentation/ABI/stable/sysfs-bus-nvmem | 19 ++-- Documentation/ABI/stable/sysfs-bus-usb| 6 +- .../ABI/stable/sysfs-class-backlight | 1 + .../ABI/stable/sysfs-class-infiniband | 93 +-- Documentation/ABI/stable/sysfs-class-rfkill | 13 ++- Documentation/ABI/stable/sysfs-class-tpm | 90 +- Documentation/ABI/stable/sysfs-devices| 5 +- Documentation/ABI/stable/sysfs-driver-ib_srp | 1 + .../ABI/stable/sysfs-firmware-efi-vars| 4 + .../ABI/stable/sysfs-firmware-opal-dump | 5 + .../ABI/stable/sysfs-firmware-opal-elog | 2 + Documentation/ABI/stable/sysfs-hypervisor-xen | 3 + Documentation/ABI/stable/vdso | 5 +- 16 files changed, 176 insertions(+), 100 deletions(-) diff --git a/Documentation/ABI/stable/firewire-cdev b/Documentation/ABI/stable/firewire-cdev index f72ed653878a..c9e8ff026154 100644 --- a/Documentation/ABI/stable/firewire-cdev +++ b/Documentation/ABI/stable/firewire-cdev @@ -14,12 +14,14 @@ Description: Each /dev/fw* is associated with one IEEE 1394 node, which can be remote or local nodes. Operations on a /dev/fw* file have different scope: + - The 1394 node which is associated with the file: - Asynchronous request transmission - Get the Configuration ROM - Query node ID - Query maximum speed of the path between this node and local node + - The 1394 bus (i.e. "card") to which the node is attached to: - Isochronous stream transmission and reception - Asynchronous stream transmission and reception @@ -31,6 +33,7 @@ Description: manager - Query cycle time - Bus reset initiation, bus reset event reception + - All 1394 buses: - Allocation of IEEE 1212 address ranges on the local link layers, reception of inbound requests to such @@ -43,6 +46,7 @@ Description: userland implement different access permission models, some operations are restricted to /dev/fw* files that are associated with a local node: + - Addition of descriptors or directories to the local nodes' Configuration ROM - PHY packet transmission and reception diff --git a/Documentation/ABI/stable/sysfs-acpi-pmprofile b/Documentation/ABI/stable/sysfs-acpi-pmprofile index 964c7a8afb26..fd97d22b677f 100644 --- a/Documentation/ABI/stable/sysfs-acpi-pmprofile +++ b/Documentation/ABI/stable/sysfs-acpi-pmprofile @@ -6,17 +6,21 @@ Description: The ACPI pm_profile sysfs interface exports the platform power management (and performance) requirement expectations as provided by BIOS. The integer value is directly passed as retrieved from the FADT ACPI table. -Values: For possible values see ACPI specification: + +Values:For possible values see ACPI specification: 5.2.9 Fixed ACPI Description Table (FADT) Field: Preferred_PM_Profile Currently these values are defined by spec: - 0 Unspecified - 1 Desktop - 2 Mobile - 3 Workstation - 4 Enterprise Server - 5 SOHO Server - 6 Appliance PC - 7 Performance Server + + == = + 0 Unspecified + 1 Desktop + 2 Mobile + 3 Workstation + 4 Enterprise Server + 5 SOHO Server + 6 Appliance PC + 7 Performance Server >7 Reserved + == = diff --git a/Documentation/ABI/stable/sysfs-bus-firewire b/Documentation/ABI/stable/sysfs-bus-firewire index 41e5a0cd1e3e..9ac9eddb82ef 100644 --- a/Documentation/ABI/stable/sysfs-bus-firewire +++ b/Documentation/ABI/stable/sysfs-bus-firewire @@ -47,6 +47,7 @@ Description: IEEE 1394 node device attribute. Read-only and immutable. Values:1: The sysfs entry represents a local node (a controller card). +
[PATCH 3/4] powerpc: Reintroduce is_kvm_guest in a new avatar
Introduce a static branch that would be set during boot if the OS happens to be a KVM guest. Subsequent checks to see if we are on KVM will rely on this static branch. This static branch would be used in vcpu_is_preempted in a subsequent patch. Signed-off-by: Srikar Dronamraju Cc: linuxppc-dev Cc: LKML Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Nathan Lynch Cc: Gautham R Shenoy Cc: Peter Zijlstra Cc: Valentin Schneider Cc: Juri Lelli Cc: Waiman Long Cc: Phil Auld --- arch/powerpc/include/asm/kvm_guest.h | 10 ++ arch/powerpc/include/asm/kvm_para.h | 2 +- arch/powerpc/kernel/firmware.c | 3 +++ 3 files changed, 14 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/kvm_guest.h b/arch/powerpc/include/asm/kvm_guest.h index ba8291e02ba9..627ba272e781 100644 --- a/arch/powerpc/include/asm/kvm_guest.h +++ b/arch/powerpc/include/asm/kvm_guest.h @@ -7,8 +7,18 @@ #define __POWERPC_KVM_GUEST_H__ #if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_KVM_GUEST) +#include + +DECLARE_STATIC_KEY_FALSE(kvm_guest); + +static inline bool is_kvm_guest(void) +{ + return static_branch_unlikely(_guest); +} + bool check_kvm_guest(void); #else +static inline bool is_kvm_guest(void) { return false; } static inline bool check_kvm_guest(void) { return false; } #endif diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index 6fba06b6cfdb..abe1b5e82547 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -14,7 +14,7 @@ static inline int kvm_para_available(void) { - return IS_ENABLED(CONFIG_KVM_GUEST) && check_kvm_guest(); + return IS_ENABLED(CONFIG_KVM_GUEST) && is_kvm_guest(); } static inline unsigned int kvm_arch_para_features(void) diff --git a/arch/powerpc/kernel/firmware.c b/arch/powerpc/kernel/firmware.c index 61243267d4cf..28498fc573f2 100644 --- a/arch/powerpc/kernel/firmware.c +++ b/arch/powerpc/kernel/firmware.c @@ -14,6 +14,7 @@ #include #include +#include #ifdef CONFIG_PPC64 unsigned long powerpc_firmware_features __read_mostly; @@ -21,6 +22,7 @@ EXPORT_SYMBOL_GPL(powerpc_firmware_features); #endif #if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_KVM_GUEST) +DEFINE_STATIC_KEY_FALSE(kvm_guest); bool check_kvm_guest(void) { struct device_node *hyper_node; @@ -32,6 +34,7 @@ bool check_kvm_guest(void) if (!of_device_is_compatible(hyper_node, "linux,kvm")) return 0; + static_branch_enable(_guest); return 1; } #endif -- 2.18.4
[PATCH 4/4] powerpc/paravirt: Use is_kvm_guest in vcpu_is_preempted
If its a shared lpar but not a KVM guest, then see if the vCPU is related to the calling vCPU. On PowerVM, only cores can be preempted. So if one vCPU is a non-preempted state, we can decipher that all other vCPUs sharing the same core are in non-preempted state. Signed-off-by: Srikar Dronamraju Cc: linuxppc-dev Cc: LKML Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Nathan Lynch Cc: Gautham R Shenoy Cc: Peter Zijlstra Cc: Valentin Schneider Cc: Juri Lelli Cc: Waiman Long Cc: Phil Auld --- arch/powerpc/include/asm/paravirt.h | 18 ++ 1 file changed, 18 insertions(+) diff --git a/arch/powerpc/include/asm/paravirt.h b/arch/powerpc/include/asm/paravirt.h index 9362c94fe3aa..edc08f04aef7 100644 --- a/arch/powerpc/include/asm/paravirt.h +++ b/arch/powerpc/include/asm/paravirt.h @@ -10,6 +10,9 @@ #endif #ifdef CONFIG_PPC_SPLPAR +#include +#include + DECLARE_STATIC_KEY_FALSE(shared_processor); static inline bool is_shared_processor(void) @@ -74,6 +77,21 @@ static inline bool vcpu_is_preempted(int cpu) { if (!is_shared_processor()) return false; + +#ifdef CONFIG_PPC_SPLPAR + if (!is_kvm_guest()) { + int first_cpu = cpu_first_thread_sibling(smp_processor_id()); + + /* +* Preemption can only happen at core granularity. This CPU +* is not preempted if one of the CPU of this core is not +* preempted. +*/ + if (cpu_first_thread_sibling(cpu) == first_cpu) + return false; + } +#endif + if (yield_count_of(cpu) & 1) return true; return false; -- 2.18.4
[PATCH 1/4] powerpc: Refactor is_kvm_guest declaration to new header
Only code/declaration movement, in anticipation of doing a kvm-aware vcpu_is_preempted. No additional changes. Signed-off-by: Srikar Dronamraju Cc: linuxppc-dev Cc: LKML Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Nathan Lynch Cc: Gautham R Shenoy Cc: Peter Zijlstra Cc: Valentin Schneider Cc: Juri Lelli Cc: Waiman Long Cc: Phil Auld --- arch/powerpc/include/asm/firmware.h | 6 -- arch/powerpc/include/asm/kvm_guest.h | 15 +++ arch/powerpc/include/asm/kvm_para.h | 2 +- arch/powerpc/platforms/pseries/smp.c | 1 + 4 files changed, 17 insertions(+), 7 deletions(-) create mode 100644 arch/powerpc/include/asm/kvm_guest.h diff --git a/arch/powerpc/include/asm/firmware.h b/arch/powerpc/include/asm/firmware.h index 0b295bdb201e..aa6a5ef5d483 100644 --- a/arch/powerpc/include/asm/firmware.h +++ b/arch/powerpc/include/asm/firmware.h @@ -134,12 +134,6 @@ extern int ibm_nmi_interlock_token; extern unsigned int __start___fw_ftr_fixup, __stop___fw_ftr_fixup; -#if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_KVM_GUEST) -bool is_kvm_guest(void); -#else -static inline bool is_kvm_guest(void) { return false; } -#endif - #ifdef CONFIG_PPC_PSERIES void pseries_probe_fw_features(void); #else diff --git a/arch/powerpc/include/asm/kvm_guest.h b/arch/powerpc/include/asm/kvm_guest.h new file mode 100644 index ..c0ace884a0e8 --- /dev/null +++ b/arch/powerpc/include/asm/kvm_guest.h @@ -0,0 +1,15 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright (C) 2020 IBM Corporation + */ + +#ifndef __POWERPC_KVM_GUEST_H__ +#define __POWERPC_KVM_GUEST_H__ + +#if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_KVM_GUEST) +bool is_kvm_guest(void); +#else +static inline bool is_kvm_guest(void) { return false; } +#endif + +#endif /* __POWERPC_KVM_GUEST_H__ */ diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index 744612054c94..abe1b5e82547 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -8,7 +8,7 @@ #ifndef __POWERPC_KVM_PARA_H__ #define __POWERPC_KVM_PARA_H__ -#include +#include #include diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c index 92922491a81c..d578732c545d 100644 --- a/arch/powerpc/platforms/pseries/smp.c +++ b/arch/powerpc/platforms/pseries/smp.c @@ -42,6 +42,7 @@ #include #include #include +#include #include "pseries.h" -- 2.18.4
[PATCH 0/4] Powerpc: Better preemption for shared processor
Currently, vcpu_is_preempted will return the yield_count for shared_processor. On a PowerVM LPAR, Phyp schedules at SMT8 core boundary i.e all CPUs belonging to a core are either group scheduled in or group scheduled out. This can be used to better predict non-preempted CPUs on PowerVM shared LPARs. perf stat -r 5 -a perf bench sched pipe -l 1000 (lesser time is better) powerpc/next 35,107,951.20 msec cpu-clock # 255.898 CPUs utilized ( +- 0.31% ) 23,655,348 context-switches #0.674 K/sec ( +- 3.72% ) 14,465 cpu-migrations#0.000 K/sec ( +- 5.37% ) 82,463 page-faults #0.002 K/sec ( +- 8.40% ) 1,127,182,328,206 cycles#0.032 GHz ( +- 1.60% ) (66.67%) 78,587,300,622 stalled-cycles-frontend #6.97% frontend cycles idle ( +- 0.08% ) (50.01%) 654,124,218,432 stalled-cycles-backend# 58.03% backend cycles idle ( +- 1.74% ) (50.01%) 834,013,059,242 instructions #0.74 insn per cycle #0.78 stalled cycles per insn ( +- 0.73% ) (66.67%) 132,911,454,387 branches #3.786 M/sec ( +- 0.59% ) (50.00%) 2,890,882,143 branch-misses #2.18% of all branches ( +- 0.46% ) (50.00%) 137.195 +- 0.419 seconds time elapsed ( +- 0.31% ) powerpc/next + patchset 29,981,702.64 msec cpu-clock # 255.881 CPUs utilized ( +- 1.30% ) 40,162,456 context-switches #0.001 M/sec ( +- 0.01% ) 1,110 cpu-migrations#0.000 K/sec ( +- 5.20% ) 62,616 page-faults #0.002 K/sec ( +- 3.93% ) 1,430,030,626,037 cycles#0.048 GHz ( +- 1.41% ) (66.67%) 83,202,707,288 stalled-cycles-frontend #5.82% frontend cycles idle ( +- 0.75% ) (50.01%) 744,556,088,520 stalled-cycles-backend# 52.07% backend cycles idle ( +- 1.39% ) (50.01%) 940,138,418,674 instructions #0.66 insn per cycle #0.79 stalled cycles per insn ( +- 0.51% ) (66.67%) 146,452,852,283 branches #4.885 M/sec ( +- 0.80% ) (50.00%) 3,237,743,996 branch-misses #2.21% of all branches ( +- 1.18% ) (50.01%) 117.17 +- 1.52 seconds time elapsed ( +- 1.30% ) This is around 14.6% improvement in performance. Cc: linuxppc-dev Cc: LKML Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Nathan Lynch Cc: Gautham R Shenoy Cc: Peter Zijlstra Cc: Valentin Schneider Cc: Juri Lelli Cc: Waiman Long Cc: Phil Auld Srikar Dronamraju (4): powerpc: Refactor is_kvm_guest declaration to new header powerpc: Rename is_kvm_guest to check_kvm_guest powerpc: Reintroduce is_kvm_guest powerpc/paravirt: Use is_kvm_guest in vcpu_is_preempted arch/powerpc/include/asm/firmware.h | 6 -- arch/powerpc/include/asm/kvm_guest.h | 25 + arch/powerpc/include/asm/kvm_para.h | 2 +- arch/powerpc/include/asm/paravirt.h | 18 ++ arch/powerpc/kernel/firmware.c | 5 - arch/powerpc/platforms/pseries/smp.c | 3 ++- 6 files changed, 50 insertions(+), 9 deletions(-) create mode 100644 arch/powerpc/include/asm/kvm_guest.h -- 2.18.4
[PATCH 2/4] powerpc: Rename is_kvm_guest to check_kvm_guest
is_kvm_guest() will be reused in subsequent patch in a new avatar. Hence rename is_kvm_guest to check_kvm_guest. No additional changes. Signed-off-by: Srikar Dronamraju Cc: linuxppc-dev Cc: LKML Cc: Michael Ellerman Cc: Nicholas Piggin Cc: Nathan Lynch Cc: Gautham R Shenoy Cc: Peter Zijlstra Cc: Valentin Schneider Cc: Juri Lelli Cc: Waiman Long Cc: Phil Auld --- arch/powerpc/include/asm/kvm_guest.h | 4 ++-- arch/powerpc/include/asm/kvm_para.h | 2 +- arch/powerpc/kernel/firmware.c | 2 +- arch/powerpc/platforms/pseries/smp.c | 2 +- 4 files changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_guest.h b/arch/powerpc/include/asm/kvm_guest.h index c0ace884a0e8..ba8291e02ba9 100644 --- a/arch/powerpc/include/asm/kvm_guest.h +++ b/arch/powerpc/include/asm/kvm_guest.h @@ -7,9 +7,9 @@ #define __POWERPC_KVM_GUEST_H__ #if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_KVM_GUEST) -bool is_kvm_guest(void); +bool check_kvm_guest(void); #else -static inline bool is_kvm_guest(void) { return false; } +static inline bool check_kvm_guest(void) { return false; } #endif #endif /* __POWERPC_KVM_GUEST_H__ */ diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index abe1b5e82547..6fba06b6cfdb 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -14,7 +14,7 @@ static inline int kvm_para_available(void) { - return IS_ENABLED(CONFIG_KVM_GUEST) && is_kvm_guest(); + return IS_ENABLED(CONFIG_KVM_GUEST) && check_kvm_guest(); } static inline unsigned int kvm_arch_para_features(void) diff --git a/arch/powerpc/kernel/firmware.c b/arch/powerpc/kernel/firmware.c index fe48d319d490..61243267d4cf 100644 --- a/arch/powerpc/kernel/firmware.c +++ b/arch/powerpc/kernel/firmware.c @@ -21,7 +21,7 @@ EXPORT_SYMBOL_GPL(powerpc_firmware_features); #endif #if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_KVM_GUEST) -bool is_kvm_guest(void) +bool check_kvm_guest(void) { struct device_node *hyper_node; diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c index d578732c545d..c70b4be9f0a5 100644 --- a/arch/powerpc/platforms/pseries/smp.c +++ b/arch/powerpc/platforms/pseries/smp.c @@ -211,7 +211,7 @@ static __init void pSeries_smp_probe(void) if (!cpu_has_feature(CPU_FTR_SMT)) return; - if (is_kvm_guest()) { + if (check_kvm_guest()) { /* * KVM emulates doorbells by disabling FSCR[MSGP] so msgsndp * faults to the hypervisor which then reads the instruction -- 2.18.4
Re: [PATCH 0/4] arch, mm: improve robustness of direct map manipulation
On Wed, Oct 28, 2020 at 12:17:35PM +0100, David Hildenbrand wrote: > On 28.10.20 12:09, Mike Rapoport wrote: > > On Tue, Oct 27, 2020 at 09:46:35AM +0100, David Hildenbrand wrote: > > > On 27.10.20 09:38, Mike Rapoport wrote: > > > > On Mon, Oct 26, 2020 at 06:05:30PM +, Edgecombe, Rick P wrote: > > > > > > > > > Beyond whatever you are seeing, for the latter case of new things > > > > > getting introduced to an interface with hidden dependencies... Another > > > > > edge case could be a new caller to set_memory_np() could result in > > > > > large NP pages. None of the callers today should cause this AFAICT, > > > > > but > > > > > it's not great to rely on the callers to know these details. > > > > > > A caller of set_memory_*() or set_direct_map_*() should expect a failure > > > > and be ready for that. So adding a WARN to safe_copy_page() is the first > > > > step in that direction :) > > > > > > > > > > I am probably missing something important, but why are we saving/restoring > > > the content of pages that were explicitly removed from the identity > > > mapping > > > such that nobody will access them? > > > > Actually, we should not be saving/restoring free pages during > > hibernation as there are several calls to mark_free_pages() that should > > exclude the free pages from the snapshot. I've tried to find why the fix > > that maps/unmaps a page to save it was required at the first place, but > > I could not find bug reports. > > > > The closest I've got is an email from Rafael that asked to update > > "hibernate: handle DEBUG_PAGEALLOC" patch: > > > > https://lore.kernel.org/linux-pm/200802200133.44098@sisk.pl/ > > > > Could it be that safe_copy_page() tries to workaround a non-existent > > problem? > > > > Clould be! Also see > > https://lkml.kernel.org/r/38de5bb0-5559-d069-0ce0-daec66ef2...@suse.cz > > which restores free page content based on more kernel parameters, not based > on the original content. Ah, after looking at it now I've run kernel with DEBUG_PAGEALLOC=y and CONFIG_INIT_ON_FREE_DEFAULT_ON=y and restore crahsed nicely. [ 27.210093] PM: Image successfully loaded [ 27.226709] Disabling non-boot CPUs ... [ 27.231208] smpboot: CPU 1 is now offline [ 27.363926] kvm-clock: cpu 0, msr 5c889001, primary cpu clock, resume [ 27.363995] BUG: unable to handle page fault for address: 9f7a40108000 [ 27.367996] #PF: supervisor write access in kernel mode [ 27.369558] #PF: error_code(0x0002) - not-present page [ 27.371098] PGD 5ca01067 P4D 5ca01067 PUD 5ca02067 PMD 5ca03067 PTE 800ff fef7060 [ 27.373421] Oops: 0002 [#1] SMP DEBUG_PAGEALLOC PTI [ 27.374905] CPU: 0 PID: 1200 Comm: bash Not tainted 5.10.0-rc1 #5 [ 27.376700] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14 .0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 27.379879] RIP: 0010:clear_page_rep+0x7/0x10 [ 27.381218] Code: e8 be 88 75 00 44 89 e2 48 89 ee 48 89 df e8 60 ff ff ff c6 03 00 5b 5d 41 5c c3 cc cc cc cc cc cc cc cc b9 00 02 00 00 31 c0 48 ab c3 0f 1f 44 00 00 31 c0 b9 40 00 00 00 66 0f 1f 84 00 00 [ 27.386457] RSP: 0018:b6838046be08 EFLAGS: 00010046 [ 27.388011] RAX: RBX: 9f7a487c0ec0 RCX: 0200 [ 27.390082] RDX: 9f7a4c788000 RSI: RDI: 9f7a40108000 [ 27.392138] RBP: 8629c860 R08: R09: 0007 [ 27.394205] R10: 0004 R11: b6838046bbf8 R12: [ 27.396271] R13: 9f7a419a62a0 R14: 0005 R15: 9f7a484f4da0 [ 27.398334] FS: 7fe0c3f6a700() GS:9f7abf80() knlGS: [ 27.400717] CS: 0010 DS: ES: CR0: 80050033 [ 27.402432] CR2: 9f7a40108000 CR3: 0859a001 CR4: 00060ef0 [ 27.404485] Call Trace: [ 27.405326] clear_free_pages+0xf5/0x150 [ 27.406568] hibernation_snapshot+0x390/0x3d0 [ 27.407908] hibernate+0xdb/0x240 [ 27.408978] state_store+0xd7/0xe0 [ 27.410078] kernfs_fop_write+0x10e/0x1a0 [ 27.411333] vfs_write+0xbb/0x210 [ 27.412423] ksys_write+0x9c/0xd0 [ 27.413488] do_syscall_64+0x33/0x40 [ 27.414636] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[PATCH 5/9] kprobes/ftrace: Add recursion protection to the ftrace callback
From: "Steven Rostedt (VMware)" If a ftrace callback does not supply its own recursion protection and does not set the RECURSION_SAFE flag in its ftrace_ops, then ftrace will make a helper trampoline to do so before calling the callback instead of just calling the callback directly. The default for ftrace_ops is going to assume recursion protection unless otherwise specified. Cc: Masami Hiramatsu Cc: Guo Ren Cc: "James E.J. Bottomley" Cc: Helge Deller Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Heiko Carstens Cc: Vasily Gorbik Cc: Christian Borntraeger Cc: Thomas Gleixner Cc: Borislav Petkov Cc: x...@kernel.org Cc: "H. Peter Anvin" Cc: "Naveen N. Rao" Cc: Anil S Keshavamurthy Cc: "David S. Miller" Cc: linux-c...@vger.kernel.org Cc: linux-par...@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s...@vger.kernel.org Signed-off-by: Steven Rostedt (VMware) --- arch/csky/kernel/probes/ftrace.c | 12 ++-- arch/parisc/kernel/ftrace.c | 13 +++-- arch/powerpc/kernel/kprobes-ftrace.c | 11 ++- arch/s390/kernel/ftrace.c| 13 +++-- arch/x86/kernel/kprobes/ftrace.c | 12 ++-- 5 files changed, 52 insertions(+), 9 deletions(-) diff --git a/arch/csky/kernel/probes/ftrace.c b/arch/csky/kernel/probes/ftrace.c index 5264763d05be..5eb2604fdf71 100644 --- a/arch/csky/kernel/probes/ftrace.c +++ b/arch/csky/kernel/probes/ftrace.c @@ -13,16 +13,21 @@ int arch_check_ftrace_location(struct kprobe *p) void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip, struct ftrace_ops *ops, struct pt_regs *regs) { + int bit; bool lr_saver = false; struct kprobe *p; struct kprobe_ctlblk *kcb; - /* Preempt is disabled by ftrace */ + bit = ftrace_test_recursion_trylock(); + if (bit < 0) + return; + + preempt_disable_notrace(); p = get_kprobe((kprobe_opcode_t *)ip); if (!p) { p = get_kprobe((kprobe_opcode_t *)(ip - MCOUNT_INSN_SIZE)); if (unlikely(!p) || kprobe_disabled(p)) - return; + goto out; lr_saver = true; } @@ -56,6 +61,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip, */ __this_cpu_write(current_kprobe, NULL); } +out: + preempt_enable_notrace(); + ftrace_test_recursion_unlock(bit); } NOKPROBE_SYMBOL(kprobe_ftrace_handler); diff --git a/arch/parisc/kernel/ftrace.c b/arch/parisc/kernel/ftrace.c index 4bab21c71055..5f7742b225a5 100644 --- a/arch/parisc/kernel/ftrace.c +++ b/arch/parisc/kernel/ftrace.c @@ -208,13 +208,19 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip, { struct kprobe_ctlblk *kcb; struct kprobe *p = get_kprobe((kprobe_opcode_t *)ip); + int bit; - if (unlikely(!p) || kprobe_disabled(p)) + bit = ftrace_test_recursion_trylock(); + if (bit < 0) return; + preempt_disable_notrace(); + if (unlikely(!p) || kprobe_disabled(p)) + goto out; + if (kprobe_running()) { kprobes_inc_nmissed_count(p); - return; + goto out; } __this_cpu_write(current_kprobe, p); @@ -235,6 +241,9 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long parent_ip, } } __this_cpu_write(current_kprobe, NULL); +out: + preempt_enable_notrace(); + ftrace_test_recursion_unlock(bit); } NOKPROBE_SYMBOL(kprobe_ftrace_handler); diff --git a/arch/powerpc/kernel/kprobes-ftrace.c b/arch/powerpc/kernel/kprobes-ftrace.c index 972cb28174b2..5df8d50c65ae 100644 --- a/arch/powerpc/kernel/kprobes-ftrace.c +++ b/arch/powerpc/kernel/kprobes-ftrace.c @@ -18,10 +18,16 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip, { struct kprobe *p; struct kprobe_ctlblk *kcb; + int bit; + bit = ftrace_test_recursion_trylock(); + if (bit < 0) + return; + + preempt_disable_notrace(); p = get_kprobe((kprobe_opcode_t *)nip); if (unlikely(!p) || kprobe_disabled(p)) - return; + goto out; kcb = get_kprobe_ctlblk(); if (kprobe_running()) { @@ -52,6 +58,9 @@ void kprobe_ftrace_handler(unsigned long nip, unsigned long parent_nip, */ __this_cpu_write(current_kprobe, NULL); } +out: + preempt_enable_notrace(); + ftrace_test_recursion_unlock(bit); } NOKPROBE_SYMBOL(kprobe_ftrace_handler); diff --git a/arch/s390/kernel/ftrace.c b/arch/s390/kernel/ftrace.c index b388e87a08bf..88466d7fb6b2 100644 --- a/arch/s390/kernel/ftrace.c +++ b/arch/s390/kernel/ftrace.c @@ -202,13 +202,19 @@ void kprobe_ftrace_handler(unsigned long ip, unsigned long
Test Results: RE: powerpc: avoid broken GCC __attribute__((optimize))
Thanks for your contribution, unfortunately we've found some issues. Your patch was successfully applied on branch powerpc/merge (8cb17737940b156329cb5210669b9c9b23f4dd56) The test build-ppc64le reported the following: Build failed! Full log: https://openpower.xyz/job/snowpatch/job/snowpatch-linux-sparse/21048//artifact/linux/report.txt Here's a preview of the log: arch/powerpc/kernel/paca.c:244:25: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'setup_paca' 244 | void __nostackprotector setup_paca(struct paca_struct *new_paca) | ^~ make[2]: *** [scripts/Makefile.build:283: arch/powerpc/kernel/paca.o] Error 1 make[1]: *** [scripts/Makefile.build:500: arch/powerpc/kernel] Error 2 make[1]: *** Waiting for unfinished jobs make: *** [Makefile:1799: arch/powerpc] Error 2 make: *** Waiting for unfinished jobs The test build-ppc64be reported the following: Build failed! Full log: https://openpower.xyz/job/snowpatch/job/snowpatch-linux-sparse/21049//artifact/linux/report.txt Here's a preview of the log: arch/powerpc/kernel/paca.c:244:25: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'setup_paca' 244 | void __nostackprotector setup_paca(struct paca_struct *new_paca) | ^~ make[2]: *** [scripts/Makefile.build:283: arch/powerpc/kernel/paca.o] Error 1 make[1]: *** [scripts/Makefile.build:500: arch/powerpc/kernel] Error 2 make[1]: *** Waiting for unfinished jobs make: *** [Makefile:1799: arch/powerpc] Error 2 make: *** Waiting for unfinished jobs The test build-ppc64e reported the following: Build failed! Full log: https://openpower.xyz/job/snowpatch/job/snowpatch-linux-sparse/21050//artifact/linux/report.txt Here's a preview of the log: arch/powerpc/kernel/paca.c:244:25: error: expected '=', ',', ';', 'asm' or '__attribute__' before 'setup_paca' 244 | void __nostackprotector setup_paca(struct paca_struct *new_paca) | ^~ make[2]: *** [scripts/Makefile.build:283: arch/powerpc/kernel/paca.o] Error 1 make[1]: *** [scripts/Makefile.build:500: arch/powerpc/kernel] Error 2 make[1]: *** Waiting for unfinished jobs make: *** [Makefile:1799: arch/powerpc] Error 2 make: *** Waiting for unfinished jobs
Re: [PATCH 0/4] arch, mm: improve robustness of direct map manipulation
On Wed, Oct 28, 2020 at 11:20:12AM +, Will Deacon wrote: > On Tue, Oct 27, 2020 at 10:38:16AM +0200, Mike Rapoport wrote: > > On Mon, Oct 26, 2020 at 06:05:30PM +, Edgecombe, Rick P wrote: > > > On Mon, 2020-10-26 at 11:05 +0200, Mike Rapoport wrote: > > > > On Mon, Oct 26, 2020 at 01:13:52AM +, Edgecombe, Rick P wrote: > > > > > On Sun, 2020-10-25 at 12:15 +0200, Mike Rapoport wrote: > > > > > > Indeed, for architectures that define > > > > > > CONFIG_ARCH_HAS_SET_DIRECT_MAP > > > > > > it is > > > > > > possible that __kernel_map_pages() would fail, but since this > > > > > > function is > > > > > > void, the failure will go unnoticed. > > > > > > > > > > Could you elaborate on how this could happen? Do you mean during > > > > > runtime today or if something new was introduced? > > > > > > > > A failure in__kernel_map_pages() may happen today. For instance, on > > > > x86 > > > > if the kernel is built with DEBUG_PAGEALLOC. > > > > > > > > __kernel_map_pages(page, 1, 0); > > > > > > > > will need to split, say, 2M page and during the split an allocation > > > > of > > > > page table could fail. > > > > > > On x86 at least, DEBUG_PAGEALLOC expects to never have to break a page > > > on the direct map and even disables locking in cpa because it assumes > > > this. If this is happening somehow anyway then we should probably fix > > > that. Even if it's a debug feature, it will not be as useful if it is > > > causing its own crashes. > > > > > > I'm still wondering if there is something I'm missing here. It seems > > > like you are saying there is a bug in some arch's, so let's add a WARN > > > in cross-arch code to log it as it crashes. A warn and making things > > > clearer seem like good ideas, but if there is a bug we should fix it. > > > The code around the callers still functionally assume re-mapping can't > > > fail. > > > > Oh, I've meant x86 kernel *without* DEBUG_PAGEALLOC, and indeed the call > > that unmaps pages back in safe_copy_page will just reset a 4K page to > > NP because whatever made it NP at the first place already did the split. > > > > Still, on arm64 with DEBUG_PAGEALLOC=n there is a possibility of a race > > between map/unmap dance in __vunmap() and safe_copy_page() that may > > cause access to unmapped memory: > > > > __vunmap() > > vm_remove_mappings() > > set_direct_map_invalid() > > safe_copy_page() > > __kernel_map_pages() > > return > > do_copy_page() -> fault > > > > This is a theoretical bug, but it is still not nice :) > > > > Just to clarify: this patch series fixes this problem, right? Yes. > Will -- Sincerely yours, Mike.
Re: [PATCH 0/4] arch, mm: improve robustness of direct map manipulation
On Tue, Oct 27, 2020 at 10:38:16AM +0200, Mike Rapoport wrote: > On Mon, Oct 26, 2020 at 06:05:30PM +, Edgecombe, Rick P wrote: > > On Mon, 2020-10-26 at 11:05 +0200, Mike Rapoport wrote: > > > On Mon, Oct 26, 2020 at 01:13:52AM +, Edgecombe, Rick P wrote: > > > > On Sun, 2020-10-25 at 12:15 +0200, Mike Rapoport wrote: > > > > > Indeed, for architectures that define > > > > > CONFIG_ARCH_HAS_SET_DIRECT_MAP > > > > > it is > > > > > possible that __kernel_map_pages() would fail, but since this > > > > > function is > > > > > void, the failure will go unnoticed. > > > > > > > > Could you elaborate on how this could happen? Do you mean during > > > > runtime today or if something new was introduced? > > > > > > A failure in__kernel_map_pages() may happen today. For instance, on > > > x86 > > > if the kernel is built with DEBUG_PAGEALLOC. > > > > > > __kernel_map_pages(page, 1, 0); > > > > > > will need to split, say, 2M page and during the split an allocation > > > of > > > page table could fail. > > > > On x86 at least, DEBUG_PAGEALLOC expects to never have to break a page > > on the direct map and even disables locking in cpa because it assumes > > this. If this is happening somehow anyway then we should probably fix > > that. Even if it's a debug feature, it will not be as useful if it is > > causing its own crashes. > > > > I'm still wondering if there is something I'm missing here. It seems > > like you are saying there is a bug in some arch's, so let's add a WARN > > in cross-arch code to log it as it crashes. A warn and making things > > clearer seem like good ideas, but if there is a bug we should fix it. > > The code around the callers still functionally assume re-mapping can't > > fail. > > Oh, I've meant x86 kernel *without* DEBUG_PAGEALLOC, and indeed the call > that unmaps pages back in safe_copy_page will just reset a 4K page to > NP because whatever made it NP at the first place already did the split. > > Still, on arm64 with DEBUG_PAGEALLOC=n there is a possibility of a race > between map/unmap dance in __vunmap() and safe_copy_page() that may > cause access to unmapped memory: > > __vunmap() > vm_remove_mappings() > set_direct_map_invalid() > safe_copy_page() > __kernel_map_pages() > return > do_copy_page() -> fault > > This is a theoretical bug, but it is still not nice :) > Just to clarify: this patch series fixes this problem, right? Will
Re: [PATCH 0/4] arch, mm: improve robustness of direct map manipulation
On 28.10.20 12:09, Mike Rapoport wrote: On Tue, Oct 27, 2020 at 09:46:35AM +0100, David Hildenbrand wrote: On 27.10.20 09:38, Mike Rapoport wrote: On Mon, Oct 26, 2020 at 06:05:30PM +, Edgecombe, Rick P wrote: Beyond whatever you are seeing, for the latter case of new things getting introduced to an interface with hidden dependencies... Another edge case could be a new caller to set_memory_np() could result in large NP pages. None of the callers today should cause this AFAICT, but it's not great to rely on the callers to know these details. A caller of set_memory_*() or set_direct_map_*() should expect a failure and be ready for that. So adding a WARN to safe_copy_page() is the first step in that direction :) I am probably missing something important, but why are we saving/restoring the content of pages that were explicitly removed from the identity mapping such that nobody will access them? Actually, we should not be saving/restoring free pages during hibernation as there are several calls to mark_free_pages() that should exclude the free pages from the snapshot. I've tried to find why the fix that maps/unmaps a page to save it was required at the first place, but I could not find bug reports. The closest I've got is an email from Rafael that asked to update "hibernate: handle DEBUG_PAGEALLOC" patch: https://lore.kernel.org/linux-pm/200802200133.44098@sisk.pl/ Could it be that safe_copy_page() tries to workaround a non-existent problem? Clould be! Also see https://lkml.kernel.org/r/38de5bb0-5559-d069-0ce0-daec66ef2...@suse.cz which restores free page content based on more kernel parameters, not based on the original content. -- Thanks, David / dhildenb
Re: [PATCH 0/4] arch, mm: improve robustness of direct map manipulation
On Tue, Oct 27, 2020 at 09:46:35AM +0100, David Hildenbrand wrote: > On 27.10.20 09:38, Mike Rapoport wrote: > > On Mon, Oct 26, 2020 at 06:05:30PM +, Edgecombe, Rick P wrote: > > > > > Beyond whatever you are seeing, for the latter case of new things > > > getting introduced to an interface with hidden dependencies... Another > > > edge case could be a new caller to set_memory_np() could result in > > > large NP pages. None of the callers today should cause this AFAICT, but > > > it's not great to rely on the callers to know these details. > > A caller of set_memory_*() or set_direct_map_*() should expect a failure > > and be ready for that. So adding a WARN to safe_copy_page() is the first > > step in that direction :) > > > > I am probably missing something important, but why are we saving/restoring > the content of pages that were explicitly removed from the identity mapping > such that nobody will access them? Actually, we should not be saving/restoring free pages during hibernation as there are several calls to mark_free_pages() that should exclude the free pages from the snapshot. I've tried to find why the fix that maps/unmaps a page to save it was required at the first place, but I could not find bug reports. The closest I've got is an email from Rafael that asked to update "hibernate: handle DEBUG_PAGEALLOC" patch: https://lore.kernel.org/linux-pm/200802200133.44098@sisk.pl/ Could it be that safe_copy_page() tries to workaround a non-existent problem? -- Sincerely yours, Mike.
Re: [PATCH 2/4] PM: hibernate: improve robustness of mapping pages in the direct map
On Tue, Oct 27, 2020 at 10:44:21PM +, Edgecombe, Rick P wrote: > On Tue, 2020-10-27 at 10:49 +0200, Mike Rapoport wrote: > > On Mon, Oct 26, 2020 at 06:57:32PM +, Edgecombe, Rick P wrote: > > > On Mon, 2020-10-26 at 11:15 +0200, Mike Rapoport wrote: > > > > On Mon, Oct 26, 2020 at 12:38:32AM +, Edgecombe, Rick P > > > > wrote: > > > > > On Sun, 2020-10-25 at 12:15 +0200, Mike Rapoport wrote: > > > > > > From: Mike Rapoport > > > > > > > > > > > > When DEBUG_PAGEALLOC or ARCH_HAS_SET_DIRECT_MAP is enabled a > > > > > > page > > > > > > may > > > > > > be > > > > > > not present in the direct map and has to be explicitly mapped > > > > > > before > > > > > > it > > > > > > could be copied. > > > > > > > > > > > > On arm64 it is possible that a page would be removed from the > > > > > > direct > > > > > > map > > > > > > using set_direct_map_invalid_noflush() but > > > > > > __kernel_map_pages() > > > > > > will > > > > > > refuse > > > > > > to map this page back if DEBUG_PAGEALLOC is disabled. > > > > > > > > > > It looks to me that arm64 __kernel_map_pages() will still > > > > > attempt > > > > > to > > > > > map it if rodata_full is true, how does this happen? > > > > > > > > Unless I misread the code, arm64 requires both rodata_full and > > > > debug_pagealloc_enabled() to be true for __kernel_map_pages() to > > > > do > > > > anything. > > > > But rodata_full condition applies to set_direct_map_*_noflush() > > > > as > > > > well, > > > > so with !rodata_full the linear map won't be ever changed. > > > > > > Hmm, looks to me that __kernel_map_pages() will only skip it if > > > both > > > debug pagealloc and rodata_full are false. > > > > > > But now I'm wondering if maybe we could simplify things by just > > > moving > > > the hibernate unmapped page logic off of the direct map. On x86, > > > text_poke() used to use this reserved fixmap pte thing that it > > > could > > > rely on to remap memory with. If hibernate had some separate pte > > > for > > > remapping like that, then we could not have any direct map > > > restrictions > > > caused by it/kernel_map_pages(), and it wouldn't have to worry > > > about > > > relying on anything else. > > > > Well, there is map_kernel_range() that can be used by hibernation as > > there is no requirement for particular virtual address, but that > > would > > be quite costly if done for every page. > > > > Maybe we can do somthing like > > > > if (kernel_page_present(s_page)) { > > do_copy_page(dst, page_address(s_page)); > > } else { > > map_kernel_range_noflush(page_address(page), PAGE_SIZE, > > PROT_READ, ); > > do_copy_page(dst, page_address(s_page)); > > unmap_kernel_range_noflush(page_address(page), > > PAGE_SIZE); > > } > > > > But it seems that a prerequisite for changing the way a page is > > mapped > > in safe_copy_page() would be to teach hibernation that a mapping here > > may fail. > > > Yea that is what I meant, the direct map could still be used for mapped > pages. > > But for the unmapped case it could have a pre-setup 4k pte for some non > direct map address. Then just change the pte to point to any unmapped > direct map page that was encountered. The point would be to give > hibernate some 4k pte of its own to manipulate so that it can't fail. > > Yet another option would be have hibernate_map_page() just map large > pages if it finds them. > > So we could teach hibernate to handle mapping failures, OR we could > change it so it doesn't rely on direct map page sizes in order to > succeed. The latter seems better to me since there isn't a reason why > it should have to fail and the resulting logic might be simpler. Both > seem like improvements in robustness though. That's correct, but as the purpose of this series is to prevent usage of __kernel_map_pages() outside DEBUG_PAGALLOC, for now I'm going to update this patch changelog and add a comment to hibernate_map_page(). -- Sincerely yours, Mike.
[PATCH] powerpc: sysdev: add missing iounmap() on error in mpic_msgr_probe()
I noticed that iounmap() of msgr_block_addr before return from mpic_msgr_probe() in the error handling case is missing. So use devm_ioremap() instead of just ioremap() when remapping the message register block, so the mapping will be automatically released on probe failure. Signed-off-by: Qinglang Miao --- arch/powerpc/sysdev/mpic_msgr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/sysdev/mpic_msgr.c b/arch/powerpc/sysdev/mpic_msgr.c index f6b253e2b..36ec0bdd8 100644 --- a/arch/powerpc/sysdev/mpic_msgr.c +++ b/arch/powerpc/sysdev/mpic_msgr.c @@ -191,7 +191,7 @@ static int mpic_msgr_probe(struct platform_device *dev) /* IO map the message register block. */ of_address_to_resource(np, 0, ); - msgr_block_addr = ioremap(rsrc.start, resource_size()); + msgr_block_addr = devm_ioremap(>dev, rsrc.start, resource_size()); if (!msgr_block_addr) { dev_err(>dev, "Failed to iomap MPIC message registers"); return -EFAULT; -- 2.23.0
Re: [PATCH] powerpc: avoid broken GCC __attribute__((optimize))
On Wed, 28 Oct 2020 at 09:04, Ard Biesheuvel wrote: > > Commit 7053f80d9696 ("powerpc/64: Prevent stack protection in early boot") > introduced a couple of uses of __attribute__((optimize)) with function > scope, to disable the stack protector in some early boot code. > > Unfortunately, and this is documented in the GCC man pages [0], overriding > function attributes for optimization is broken, and is only supported for > debug scenarios, not for production: the problem appears to be that > setting GCC -f flags using this method will cause it to forget about some > or all other optimization settings that have been applied. > > So the only safe way to disable the stack protector is to disable it for > the entire source file. > > [0] https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html > > Cc: Michael Ellerman > Cc: Benjamin Herrenschmidt > Cc: Paul Mackerras > Cc: Nick Desaulniers > Cc: Arvind Sankar > Cc: Randy Dunlap > Cc: Josh Poimboeuf > Cc: Thomas Gleixner > Cc: Alexei Starovoitov > Cc: Daniel Borkmann > Cc: Peter Zijlstra (Intel) > Cc: Geert Uytterhoeven > Cc: Kees Cook > Fixes: 7053f80d9696 ("powerpc/64: Prevent stack protection in early boot") > Signed-off-by: Ard Biesheuvel > --- > Related discussion here: > https://lore.kernel.org/lkml/CAMuHMdUg0WJHEcq6to0-eODpXPOywLot6UD2=gfhpzoj_hc...@mail.gmail.com/ > > TL;DR using __attribute__((optimize("-fno-gcse"))) in the BPF interpreter > causes the compiler to forget about -fno-asynchronous-unwind-tables passed > on the command line, resulting in unexpected .eh_frame sections in vmlinux. > > arch/powerpc/kernel/Makefile | 3 +++ > arch/powerpc/kernel/paca.c | 2 +- > arch/powerpc/kernel/setup.h| 6 -- > arch/powerpc/kernel/setup_64.c | 2 +- > 4 files changed, 5 insertions(+), 8 deletions(-) > FYI i was notified by one of the robots that I missed one occurrence of __nostackprotector in arch/powerpc/kernel/paca.c Let me know if I need to resend. > diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile > index bf0bf1b900d2..fe2ef598e2ea 100644 > --- a/arch/powerpc/kernel/Makefile > +++ b/arch/powerpc/kernel/Makefile > @@ -173,6 +173,9 @@ KCOV_INSTRUMENT_cputable.o := n > KCOV_INSTRUMENT_setup_64.o := n > KCOV_INSTRUMENT_paca.o := n > > +CFLAGS_setup_64.o += -fno-stack-protector > +CFLAGS_paca.o += -fno-stack-protector > + > extra-$(CONFIG_PPC_FPU)+= fpu.o > extra-$(CONFIG_ALTIVEC)+= vector.o > extra-$(CONFIG_PPC64) += entry_64.o > diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c > index 0ad15768d762..fe70834d7283 100644 > --- a/arch/powerpc/kernel/paca.c > +++ b/arch/powerpc/kernel/paca.c > @@ -208,7 +208,7 @@ static struct rtas_args * __init new_rtas_args(int cpu, > unsigned long limit) > struct paca_struct **paca_ptrs __read_mostly; > EXPORT_SYMBOL(paca_ptrs); > > -void __init __nostackprotector initialise_paca(struct paca_struct *new_paca, > int cpu) > +void __init initialise_paca(struct paca_struct *new_paca, int cpu) > { > #ifdef CONFIG_PPC_PSERIES > new_paca->lppaca_ptr = NULL; > diff --git a/arch/powerpc/kernel/setup.h b/arch/powerpc/kernel/setup.h > index 2ec835574cc9..2dd0d9cb5a20 100644 > --- a/arch/powerpc/kernel/setup.h > +++ b/arch/powerpc/kernel/setup.h > @@ -8,12 +8,6 @@ > #ifndef __ARCH_POWERPC_KERNEL_SETUP_H > #define __ARCH_POWERPC_KERNEL_SETUP_H > > -#ifdef CONFIG_CC_IS_CLANG > -#define __nostackprotector > -#else > -#define __nostackprotector > __attribute__((__optimize__("no-stack-protector"))) > -#endif > - > void initialize_cache_info(void); > void irqstack_early_init(void); > > diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c > index bb9cab3641d7..da447a62ea1e 100644 > --- a/arch/powerpc/kernel/setup_64.c > +++ b/arch/powerpc/kernel/setup_64.c > @@ -283,7 +283,7 @@ void __init record_spr_defaults(void) > * device-tree is not accessible via normal means at this point. > */ > > -void __init __nostackprotector early_setup(unsigned long dt_ptr) > +void __init early_setup(unsigned long dt_ptr) > { > static __initdata struct paca_struct boot_paca; > > -- > 2.17.1 >
[PATCH] powerpc: avoid broken GCC __attribute__((optimize))
Commit 7053f80d9696 ("powerpc/64: Prevent stack protection in early boot") introduced a couple of uses of __attribute__((optimize)) with function scope, to disable the stack protector in some early boot code. Unfortunately, and this is documented in the GCC man pages [0], overriding function attributes for optimization is broken, and is only supported for debug scenarios, not for production: the problem appears to be that setting GCC -f flags using this method will cause it to forget about some or all other optimization settings that have been applied. So the only safe way to disable the stack protector is to disable it for the entire source file. [0] https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html Cc: Michael Ellerman Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Nick Desaulniers Cc: Arvind Sankar Cc: Randy Dunlap Cc: Josh Poimboeuf Cc: Thomas Gleixner Cc: Alexei Starovoitov Cc: Daniel Borkmann Cc: Peter Zijlstra (Intel) Cc: Geert Uytterhoeven Cc: Kees Cook Fixes: 7053f80d9696 ("powerpc/64: Prevent stack protection in early boot") Signed-off-by: Ard Biesheuvel --- Related discussion here: https://lore.kernel.org/lkml/CAMuHMdUg0WJHEcq6to0-eODpXPOywLot6UD2=gfhpzoj_hc...@mail.gmail.com/ TL;DR using __attribute__((optimize("-fno-gcse"))) in the BPF interpreter causes the compiler to forget about -fno-asynchronous-unwind-tables passed on the command line, resulting in unexpected .eh_frame sections in vmlinux. arch/powerpc/kernel/Makefile | 3 +++ arch/powerpc/kernel/paca.c | 2 +- arch/powerpc/kernel/setup.h| 6 -- arch/powerpc/kernel/setup_64.c | 2 +- 4 files changed, 5 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile index bf0bf1b900d2..fe2ef598e2ea 100644 --- a/arch/powerpc/kernel/Makefile +++ b/arch/powerpc/kernel/Makefile @@ -173,6 +173,9 @@ KCOV_INSTRUMENT_cputable.o := n KCOV_INSTRUMENT_setup_64.o := n KCOV_INSTRUMENT_paca.o := n +CFLAGS_setup_64.o += -fno-stack-protector +CFLAGS_paca.o += -fno-stack-protector + extra-$(CONFIG_PPC_FPU)+= fpu.o extra-$(CONFIG_ALTIVEC)+= vector.o extra-$(CONFIG_PPC64) += entry_64.o diff --git a/arch/powerpc/kernel/paca.c b/arch/powerpc/kernel/paca.c index 0ad15768d762..fe70834d7283 100644 --- a/arch/powerpc/kernel/paca.c +++ b/arch/powerpc/kernel/paca.c @@ -208,7 +208,7 @@ static struct rtas_args * __init new_rtas_args(int cpu, unsigned long limit) struct paca_struct **paca_ptrs __read_mostly; EXPORT_SYMBOL(paca_ptrs); -void __init __nostackprotector initialise_paca(struct paca_struct *new_paca, int cpu) +void __init initialise_paca(struct paca_struct *new_paca, int cpu) { #ifdef CONFIG_PPC_PSERIES new_paca->lppaca_ptr = NULL; diff --git a/arch/powerpc/kernel/setup.h b/arch/powerpc/kernel/setup.h index 2ec835574cc9..2dd0d9cb5a20 100644 --- a/arch/powerpc/kernel/setup.h +++ b/arch/powerpc/kernel/setup.h @@ -8,12 +8,6 @@ #ifndef __ARCH_POWERPC_KERNEL_SETUP_H #define __ARCH_POWERPC_KERNEL_SETUP_H -#ifdef CONFIG_CC_IS_CLANG -#define __nostackprotector -#else -#define __nostackprotector __attribute__((__optimize__("no-stack-protector"))) -#endif - void initialize_cache_info(void); void irqstack_early_init(void); diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index bb9cab3641d7..da447a62ea1e 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -283,7 +283,7 @@ void __init record_spr_defaults(void) * device-tree is not accessible via normal means at this point. */ -void __init __nostackprotector early_setup(unsigned long dt_ptr) +void __init early_setup(unsigned long dt_ptr) { static __initdata struct paca_struct boot_paca; -- 2.17.1
Re: [PATCH 1/3] powerpc/uaccess: Switch __put_user_size_allowed() to __put_user_asm_goto()
On Okt 28 2020, Michael Ellerman wrote: > What config and compiler are you using? gcc 4.9. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
[PATCH kernel v3 2/2] powerpc/dma: Fallback to dma_ops when persistent memory present
So far we have been using huge DMA windows to map all the RAM available. The RAM is normally mapped to the VM address space contiguously, and there is always a reasonable upper limit for possible future hot plugged RAM which makes it easy to map all RAM via IOMMU. Now there is persistent memory ("ibm,pmemory" in the FDT) which (unlike normal RAM) can map anywhere in the VM space beyond the maximum RAM size and since it can be used for DMA, it requires extending the huge window up to MAX_PHYSMEM_BITS which requires hypervisor support for: 1. huge TCE tables; 2. multilevel TCE tables; 3. huge IOMMU pages. Certain hypervisors cannot do either so the only option left is restricting the huge DMA window to include only RAM and fallback to the default DMA window for persistent memory. This defines arch_dma_map_direct/etc to allow generic DMA code perform additional checks on whether direct DMA is still possible. This checks if the system has persistent memory. If it does not, the DMA bypass mode is selected, i.e. * dev->bus_dma_limit = 0 * dev->dma_ops_bypass = true <- this avoid calling dma_ops for mapping. If there is such memory, this creates identity mapping only for RAM and sets the dev->bus_dma_limit to let the generic code decide whether to call into the direct DMA or the indirect DMA ops. This should not change the existing behaviour when no persistent memory as dev->dma_ops_bypass is expected to be set. Signed-off-by: Alexey Kardashevskiy --- arch/powerpc/kernel/dma-iommu.c| 70 +- arch/powerpc/platforms/pseries/iommu.c | 44 arch/powerpc/Kconfig | 1 + 3 files changed, 103 insertions(+), 12 deletions(-) diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c index a1c744194018..21e2d9f059a9 100644 --- a/arch/powerpc/kernel/dma-iommu.c +++ b/arch/powerpc/kernel/dma-iommu.c @@ -10,6 +10,63 @@ #include #include +#define can_map_direct(dev, addr) \ + ((dev)->bus_dma_limit >= phys_to_dma((dev), (addr))) + +bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr) +{ + if (likely(!dev->bus_dma_limit)) + return false; + + return can_map_direct(dev, addr); +} +EXPORT_SYMBOL_GPL(arch_dma_map_page_direct); + +#define is_direct_handle(dev, h) ((h) >= (dev)->archdata.dma_offset) + +bool arch_dma_unmap_page_direct(struct device *dev, dma_addr_t dma_handle) +{ + if (likely(!dev->bus_dma_limit)) + return false; + + return is_direct_handle(dev, dma_handle); +} +EXPORT_SYMBOL_GPL(arch_dma_unmap_page_direct); + +bool arch_dma_map_sg_direct(struct device *dev, struct scatterlist *sg, int nents) +{ + struct scatterlist *s; + int i; + + if (likely(!dev->bus_dma_limit)) + return false; + + for_each_sg(sg, s, nents, i) { + if (!can_map_direct(dev, sg_phys(s) + s->offset + s->length)) + return false; + } + + return true; +} +EXPORT_SYMBOL(arch_dma_map_sg_direct); + +bool arch_dma_unmap_sg_direct(struct device *dev, struct scatterlist *sg, int nents) +{ + struct scatterlist *s; + int i; + + if (likely(!dev->bus_dma_limit)) + return false; + + for_each_sg(sg, s, nents, i) { + if (!is_direct_handle(dev, s->dma_address + s->length)) + return false; + } + + return true; +} +EXPORT_SYMBOL(arch_dma_unmap_sg_direct); + /* * Generic iommu implementation */ @@ -90,8 +147,17 @@ int dma_iommu_dma_supported(struct device *dev, u64 mask) struct iommu_table *tbl = get_iommu_table_base(dev); if (dev_is_pci(dev) && dma_iommu_bypass_supported(dev, mask)) { - dev->dma_ops_bypass = true; - dev_dbg(dev, "iommu: 64-bit OK, using fixed ops\n"); + /* +* dma_iommu_bypass_supported() sets dma_max when there is +* 1:1 mapping but it is somehow limited. +* ibm,pmemory is one example. +*/ + dev->dma_ops_bypass = dev->bus_dma_limit == 0; + if (!dev->dma_ops_bypass) + dev_warn(dev, "iommu: 64-bit OK but direct DMA is limited by %llx\n", +dev->bus_dma_limit); + else + dev_dbg(dev, "iommu: 64-bit OK, using fixed ops\n"); return 1; } diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c index e4198700ed1a..91112e748491 100644 --- a/arch/powerpc/platforms/pseries/iommu.c +++ b/arch/powerpc/platforms/pseries/iommu.c @@ -839,7 +839,7 @@ static void remove_ddw(struct device_node *np, bool remove_prop) np, ret); } -static u64 find_existing_ddw(struct device_node *pdn) +static u64 find_existing_ddw(struct device_node *pdn, int *window_shift) { struct direct_window *window;
[PATCH kernel v3 0/2] DMA, powerpc/dma: Fallback to dma_ops when persistent memory present
This allows mixing direct DMA (to/from RAM) and IOMMU (to/from apersistent memory) on the PPC64/pseries platform. This replaces https://lkml.org/lkml/2020/10/27/418 which replaces https://lkml.org/lkml/2020/10/20/1085 This is based on sha1 4525c8781ec0 Linus Torvalds "scsi: qla2xxx: remove incorrect sparse #ifdef". Please comment. Thanks. Alexey Kardashevskiy (2): dma: Allow mixing bypass and mapped DMA operation powerpc/dma: Fallback to dma_ops when persistent memory present arch/powerpc/kernel/dma-iommu.c| 70 +- arch/powerpc/platforms/pseries/iommu.c | 44 kernel/dma/mapping.c | 24 +++-- arch/powerpc/Kconfig | 1 + kernel/dma/Kconfig | 4 ++ 5 files changed, 127 insertions(+), 16 deletions(-) -- 2.17.1
[PATCH kernel v3 1/2] dma: Allow mixing bypass and mapped DMA operation
At the moment we allow bypassing DMA ops only when we can do this for the entire RAM. However there are configs with mixed type memory where we could still allow bypassing IOMMU in most cases; POWERPC with persistent memory is one example. This adds an arch hook to determine where bypass can still work and we invoke direct DMA API. The following patch checks the bus limit on POWERPC to allow or disallow direct mapping. This adds a CONFIG_ARCH_HAS_DMA_SET_MASK config option to make arch_ hooks no-op by default. Signed-off-by: Alexey Kardashevskiy --- kernel/dma/mapping.c | 24 kernel/dma/Kconfig | 4 2 files changed, 24 insertions(+), 4 deletions(-) diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index 51bb8fa8eb89..a0bc9eb876ed 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -137,6 +137,18 @@ static inline bool dma_map_direct(struct device *dev, return dma_go_direct(dev, *dev->dma_mask, ops); } +#ifdef CONFIG_ARCH_HAS_DMA_MAP_DIRECT +bool arch_dma_map_page_direct(struct device *dev, phys_addr_t addr); +bool arch_dma_unmap_page_direct(struct device *dev, dma_addr_t dma_handle); +bool arch_dma_map_sg_direct(struct device *dev, struct scatterlist *sg, int nents); +bool arch_dma_unmap_sg_direct(struct device *dev, struct scatterlist *sg, int nents); +#else +#define arch_dma_map_page_direct(d, a) (0) +#define arch_dma_unmap_page_direct(d, a) (0) +#define arch_dma_map_sg_direct(d, s, n) (0) +#define arch_dma_unmap_sg_direct(d, s, n) (0) +#endif + dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page, size_t offset, size_t size, enum dma_data_direction dir, unsigned long attrs) @@ -149,7 +161,8 @@ dma_addr_t dma_map_page_attrs(struct device *dev, struct page *page, if (WARN_ON_ONCE(!dev->dma_mask)) return DMA_MAPPING_ERROR; - if (dma_map_direct(dev, ops)) + if (dma_map_direct(dev, ops) || + arch_dma_map_page_direct(dev, page_to_phys(page) + offset + size)) addr = dma_direct_map_page(dev, page, offset, size, dir, attrs); else addr = ops->map_page(dev, page, offset, size, dir, attrs); @@ -165,7 +178,8 @@ void dma_unmap_page_attrs(struct device *dev, dma_addr_t addr, size_t size, const struct dma_map_ops *ops = get_dma_ops(dev); BUG_ON(!valid_dma_direction(dir)); - if (dma_map_direct(dev, ops)) + if (dma_map_direct(dev, ops) || + arch_dma_unmap_page_direct(dev, addr + size)) dma_direct_unmap_page(dev, addr, size, dir, attrs); else if (ops->unmap_page) ops->unmap_page(dev, addr, size, dir, attrs); @@ -188,7 +202,8 @@ int dma_map_sg_attrs(struct device *dev, struct scatterlist *sg, int nents, if (WARN_ON_ONCE(!dev->dma_mask)) return 0; - if (dma_map_direct(dev, ops)) + if (dma_map_direct(dev, ops) || + arch_dma_map_sg_direct(dev, sg, nents)) ents = dma_direct_map_sg(dev, sg, nents, dir, attrs); else ents = ops->map_sg(dev, sg, nents, dir, attrs); @@ -207,7 +222,8 @@ void dma_unmap_sg_attrs(struct device *dev, struct scatterlist *sg, BUG_ON(!valid_dma_direction(dir)); debug_dma_unmap_sg(dev, sg, nents, dir); - if (dma_map_direct(dev, ops)) + if (dma_map_direct(dev, ops) || + arch_dma_unmap_sg_direct(dev, sg, nents)) dma_direct_unmap_sg(dev, sg, nents, dir, attrs); else if (ops->unmap_sg) ops->unmap_sg(dev, sg, nents, dir, attrs); diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig index c99de4a21458..43d106598e82 100644 --- a/kernel/dma/Kconfig +++ b/kernel/dma/Kconfig @@ -20,6 +20,10 @@ config DMA_OPS config DMA_OPS_BYPASS bool +# Lets platform IOMMU driver choose between bypass and IOMMU +config ARCH_HAS_DMA_MAP_DIRECT + bool + config NEED_SG_DMA_LENGTH bool -- 2.17.1
Re: [PATCH kernel v2 1/2] dma: Allow mixing bypass and normal IOMMU operation
On 28/10/2020 03:48, Christoph Hellwig wrote: +static inline bool dma_handle_direct(struct device *dev, dma_addr_t dma_handle) +{ + return dma_handle >= dev->archdata.dma_offset; +} This won't compile except for powerpc, and directly accesing arch members in common code is a bad idea. Maybe both your helpers need to be supplied by arch code to better abstract this out. rats, overlooked it :( bus_dma_limit is generic but dma_offset is in archdata :-/ if (dma_map_direct(dev, ops)) addr = dma_direct_map_page(dev, page, offset, size, dir, attrs); +#ifdef CONFIG_DMA_OPS_BYPASS_BUS_LIMIT + else if (dev->bus_dma_limit && +can_map_direct(dev, (phys_addr_t) page_to_phys(page) + offset + size)) + addr = dma_direct_map_page(dev, page, offset, size, dir, attrs); +#endif I don't think page_to_phys needs a phys_addr_t on the return value. I'd also much prefer if we make this a little more beautiful, here are a few suggestions: - hide the bus_dma_limit check inside can_map_direct, and provide a stub so that we can avoid the ifdef - use a better name for can_map_direct, and maybe also a better calling convention by passing the page (the sg code also has the page), It is passing an address of the end of the mapped area so passing a page struct means passing page and offset which is an extra parameter and we do not want to do anything with the page in those hooks anyway so I'd keep it as is. and maybe even hide the dma_map_direct inside it. Call dma_map_direct() from arch_dma_map_page_direct() if arch_dma_map_page_direct() is defined? Seems suboptimal as it is going to be bypass=true in most cases and we save one call by avoiding calling arch_dma_map_page_direct(). Unless I missed something? if (dma_map_direct(dev, ops) || arch_dma_map_page_direct(dev, page, offset, size)) addr = dma_direct_map_page(dev, page, offset, size, dir, attrs); BUG_ON(!valid_dma_direction(dir)); if (dma_map_direct(dev, ops)) dma_direct_unmap_page(dev, addr, size, dir, attrs); +#ifdef CONFIG_DMA_OPS_BYPASS_BUS_LIMIT + else if (dev->bus_dma_limit && dma_handle_direct(dev, addr + size)) + dma_direct_unmap_page(dev, addr, size, dir, attrs); +#endif Same here. if (dma_map_direct(dev, ops)) ents = dma_direct_map_sg(dev, sg, nents, dir, attrs); +#ifdef CONFIG_DMA_OPS_BYPASS_BUS_LIMIT + else if (dev->bus_dma_limit) { + struct scatterlist *s; + bool direct = true; + int i; + + for_each_sg(sg, s, nents, i) { + direct = can_map_direct(dev, sg_phys(s) + s->offset + s->length); + if (!direct) + break; + } + if (direct) + ents = dma_direct_map_sg(dev, sg, nents, dir, attrs); + else + ents = ops->map_sg(dev, sg, nents, dir, attrs); + } +#endif This needs to go into a helper as well. I think the same style as above would work pretty nicely as well: Yup. I'll repost v3 soon with this change. Thanks for the review. if (dma_map_direct(dev, ops) || arch_dma_map_sg_direct(dev, sg, nents)) ents = dma_direct_map_sg(dev, sg, nents, dir, attrs); else ents = ops->map_sg(dev, sg, nents, dir, attrs); +#ifdef CONFIG_DMA_OPS_BYPASS_BUS_LIMIT + if (dev->bus_dma_limit) { + struct scatterlist *s; + bool direct = true; + int i; + + for_each_sg(sg, s, nents, i) { + direct = dma_handle_direct(dev, s->dma_address + s->length); + if (!direct) + break; + } + if (direct) { + dma_direct_unmap_sg(dev, sg, nents, dir, attrs); + return; + } + } +#endif One more time here.. -- Alexey