Re: [PATCH] ASoC: imx-audmux: Add driver suspend and resume to support MEGA Fast

2019-08-16 Thread S.j. Wang
Hi Mark

> 
> On Fri, Aug 16, 2019 at 01:03:14AM -0400, Shengjiu Wang wrote:
> 
> > +   for (i = 0; i < reg_max; i++)
> > +   regcache[i] = readl(audmux_base + i * 4);
> 
> If only there were some framework which provided a register cache!  

Yes, next step I can refine this driver to use the regmap.

Best regards
Wang shengjiu


Re: [PATCH 0/6] drm+dma: cache support for arm, etc

2019-08-16 Thread Rob Clark
On Thu, Aug 15, 2019 at 10:53 AM Christoph Hellwig  wrote:
>
> On Thu, Aug 15, 2019 at 06:54:39AM -0700, Rob Clark wrote:
> > On Wed, Aug 14, 2019 at 11:51 PM Christoph Hellwig  wrote:
> > >
> > > As said before I don't think these low-level helpers are the
> > > right API to export, but even if they did you'd just cover a tiny
> > > subset of the architectures.
> >
> > Are you thinking instead something like:
> >
> > void dma_sync_sg_for_{cpu,device}(struct device *dev, struct scatterlist 
> > *sgl,
> >   int nents, enum dma_data_direction dir)
> > {
> > for_each_sg(sgl, sg, nents, i) {
> > arch_sync_dma_for_..(dev, sg_phys(sg), sg->length, dir);
> > }
> > }
> > EXPORT_SYMBOL_GPL(dma_sync_sg_for_..)
> >
> > or did you have something else in mind?
>
> No.  We really need an interface thay says please give me uncached
> memory (for some definition of uncached that includes that grapics
> drivers call write combine), and then let the architecture do the right
> thing.  Basically dma_alloc_coherent with DMA_ATTR_NO_KERNEL_MAPPING
> is superficially close to what you want, except that the way the drm
> drivers work you can't actually use it.

I don't disagree about needing an API to get uncached memory (or
ideally just something outside of the linear map).  But I think this
is a separate problem.

What I was hoping for, for v5.4, is a way to stop abusing dma_map/sync
for cache ops to get rid of the hack I had to make for v5.3.  And also
to fix vgem on non-x86.  (Unfortunately changing vgem to used cached
mappings breaks x86 CI, but fixes CI on arm/arm64..)  We can do that
without any changes in allocation.  There is still the possibility for
problems due to cached alias, but that has been a problem this whole
time, it isn't something new.

BR,
-R

> The reason for that is if we can we really need to not create another
> uncachable alias, but instead change the page attributes in place.
> On x86 we can and must do that for example, and based on the
> conversation with Will arm64 could do that fairly easily.  arm32 can
> right now only do that for CMA, though.
>
> The big question is what API do we want.  I had a pretty similar
> discussion with Christian on doing such an allocation for amdgpu,
> where the device normally is cache coherent, but they actually want
> to turn it into non-coherent by using PCIe unsnooped transactions.
>
> Here is my high level plan, which still has a few lose end:
>
>  (1) provide a new API:
>
> struct page *dma_alloc_pages(struct device *dev, unsigned nr_pages,
> gfp_t gfp, unsigned long flags);
> void dma_free_pages(struct device *dev, unsigned nr_pages,
> unsigned long flags);
>
>  These give you back page backed memory that is guaranteed to be
>  addressable by the device (no swiotlb or similar).  The memory can
>  then be mapped using dma_map*, including unmap and dma_sync to
>  bounce ownership around.  This is the replacement for the current
>  dma_alloc_attrs with DMA_ATTR_NON_CONSISTENT API, that is rather
>  badly defined.
>
>  (2) Add support for DMA_ATTR_NO_KERNEL_MAPPING to this new API instead
>  of dma_alloc_attrs.  The initial difference with that flag is just
>  that we allow highmem, but in the future we could also unmap this
>  memory from the kernel linear mapping entirely on architectures
>  where we can easily do that.
>
>  (3) Add a dma_pages_map/dma_pages_unmap or similar API that allows you
>  to get a kernel mapping for parts or all of a
>  DMA_ATTR_NO_KERNEL_MAPPING allocation.  This is to replace things
>  like your open-coded vmap in msm (or similarly elsewhere in dma-buf
>  providers).
>
>  (4) Add support for a DMA_ATTR_UNCACHABLE flags (or similar) to the new
>  API, that maps the pages as uncachable iff they have a kernel
>  mapping, including invalidating the caches at time of this page
>  attribute change (or creation of a new mapping).  This API will fail
>  if the architecture does not allow in-place remapping.  Note that for
>  arm32 we could always dip into the CMA pool if one is present to not
>  fail.  We'll also need some helper to map from the DMA_ATTR_* flags
>  to a pgprot for mapping the page to userspace.  There is also a few
>  other weird bits here, e.g. on architectures like mips that use an
>  uncached segment we'll have to fail use with the plain
>  DMA_ATTR_UNCACHABLE flag, but it could be supported with
>  DMA_ATTR_UNCACHABLE | DMA_ATTR_NO_KERNEL_MAPPING.
>
> I was hoping to get most of this done for this merge window, but I'm
> probably lucky if I get at least parts done.  Too much distraction.
>
> > Hmm, not entirely sure why.. you should be on the cc list for each
> > individual patch.
>
> They finally made it, although even with the delay they only ended up
> in the spam mailbox.  I still can't see them on the 

Re: [PATCH v2 09/44] powerpc/64s/pseries: machine check convert to use common event code

2019-08-16 Thread Michael Ellerman
kbuild test robot  writes:
> Hi Nicholas,
>
> I love your patch! Yet something to improve:
>
> [auto build test ERROR on linus/master]
> [cannot apply to v5.3-rc3 next-20190807]
> [if your patch is applied to the wrong git tree, please drop us a note to 
> help improve the system]
>
> url:
> https://github.com/0day-ci/linux/commits/Nicholas-Piggin/powerpc-64s-exception-cleanup-and-macrofiy/20190802-11
> config: powerpc-defconfig (attached as .config)
> compiler: powerpc64-linux-gcc (GCC) 7.4.0
> reproduce:
> wget 
> https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
> ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # save the attached .config to linux build tree
> GCC_VERSION=7.4.0 make.cross ARCH=powerpc 
>
> If you fix the issue, kindly add following tag
> Reported-by: kbuild test robot 
>
> All errors (new ones prefixed by >>):
>
>arch/powerpc/platforms/pseries/ras.c: In function 'mce_handle_error':
>>> arch/powerpc/platforms/pseries/ras.c:563:28: error: this statement may fall 
>>> through [-Werror=implicit-fallthrough=]
>mce_err.u.ue_error_type = MCE_UE_ERROR_IFETCH;
>^
>arch/powerpc/platforms/pseries/ras.c:564:3: note: here
>   case MC_ERROR_UE_PAGE_TABLE_WALK_IFETCH:
>   ^~~~
>arch/powerpc/platforms/pseries/ras.c:565:28: error: this statement may 
> fall through [-Werror=implicit-fallthrough=]
>mce_err.u.ue_error_type = MCE_UE_ERROR_PAGE_TABLE_WALK_IFETCH;
>^
>arch/powerpc/platforms/pseries/ras.c:566:3: note: here
>   case MC_ERROR_UE_LOAD_STORE:
>   ^~~~
>arch/powerpc/platforms/pseries/ras.c:567:28: error: this statement may 
> fall through [-Werror=implicit-fallthrough=]
>mce_err.u.ue_error_type = MCE_UE_ERROR_LOAD_STORE;
>^
>arch/powerpc/platforms/pseries/ras.c:568:3: note: here
>   case MC_ERROR_UE_PAGE_TABLE_WALK_LOAD_STORE:
>   ^~~~
>arch/powerpc/platforms/pseries/ras.c:569:28: error: this statement may 
> fall through [-Werror=implicit-fallthrough=]
>mce_err.u.ue_error_type = MCE_UE_ERROR_PAGE_TABLE_WALK_LOAD_STORE;
>^
>arch/powerpc/platforms/pseries/ras.c:570:3: note: here
>   case MC_ERROR_UE_INDETERMINATE:
>   ^~~~
>cc1: all warnings being treated as errors

I think you meant to break in all these cases?

cheers


[Bug 204371] BUG kmalloc-4k (Tainted: G W ): Object padding overwritten

2019-08-16 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=204371

--- Comment #37 from Erhard F. (erhar...@mailbox.org) ---
On Fri, 16 Aug 2019 15:20:47 +
bugzilla-dae...@bugzilla.kernel.org wrote:

Ok, tested the G5 + patch now. It boots from a btrfs partition with SLUB
debugging + btrfs debug & selftests enabled. So at least on the PowerPC side
everything is back to working condition again.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Re: [PATCH 6/6] arm64: document the choice of page attributes for pgprot_dmacoherent

2019-08-16 Thread Will Deacon
On Fri, Aug 16, 2019 at 07:59:42PM +0200, Christoph Hellwig wrote:
> On Fri, Aug 16, 2019 at 06:31:18PM +0100, Will Deacon wrote:
> > Mind if I tweak the second sentence to be:
> > 
> >   This is different from "Device-nGnR[nE]" memory which is intended for MMIO
> >   and thus forbids speculation, preserves access size, requires strict
> >   alignment and can also force write responses to come from the endpoint.
> > 
> > ? It's a small change, but it better fits with the arm64 terminology
> > ("strongly ordered" is no longer used in the architecture).
> > 
> > If you're happy with that, I can make the change and queue this patch
> > for 5.4.
> 
> I'm fine with the change, but you really need this series as base,
> as there is no pgprot_dmacoherent before the series.  So I think I'll
> have to queue it up if we want it for 5.4, and I'll need a few more
> reviews for the other patches in this series first.

Ah, I didn't think about the contextual stuff. In which case, with my
change in wording:

Acked-by: Will Deacon 

and feel free to route it with the rest.

Thanks,

Will


[Bug 204371] BUG kmalloc-4k (Tainted: G W ): Object padding overwritten

2019-08-16 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=204371

--- Comment #36 from Erhard F. (erhar...@mailbox.org) ---
On Fri, 16 Aug 2019 15:20:47 +
bugzilla-dae...@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=204371
> 
> --- Comment #35 from Christophe Leroy (christophe.le...@c-s.fr) ---
> That's good news. Will you handle submitting the patch to BTRFS file 
> system ?
Thats nice of you. But as my part in this process was only searching &
replacing some code without deeper knowledge of what it's doing, I guess the
patch is yours. ;) Also if any questions or follow-up patches arise I am not
the right person to ask.

And probably I should test it on the G5 first, the 'BUG kmalloc-4k (Tainted: G 
  W): Object padding overwritten' happened here too.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Re: [PATCH 6/6] arm64: document the choice of page attributes for pgprot_dmacoherent

2019-08-16 Thread Christoph Hellwig
On Fri, Aug 16, 2019 at 06:31:18PM +0100, Will Deacon wrote:
> Mind if I tweak the second sentence to be:
> 
>   This is different from "Device-nGnR[nE]" memory which is intended for MMIO
>   and thus forbids speculation, preserves access size, requires strict
>   alignment and can also force write responses to come from the endpoint.
> 
> ? It's a small change, but it better fits with the arm64 terminology
> ("strongly ordered" is no longer used in the architecture).
> 
> If you're happy with that, I can make the change and queue this patch
> for 5.4.

I'm fine with the change, but you really need this series as base,
as there is no pgprot_dmacoherent before the series.  So I think I'll
have to queue it up if we want it for 5.4, and I'll need a few more
reviews for the other patches in this series first.


Re: [PATCH v4 1/3] kasan: support backing vmalloc space with real shadow memory

2019-08-16 Thread Andy Lutomirski
On Fri, Aug 16, 2019 at 10:08 AM Mark Rutland  wrote:
>
> Hi Christophe,
>
> On Fri, Aug 16, 2019 at 09:47:00AM +0200, Christophe Leroy wrote:
> > Le 15/08/2019 à 02:16, Daniel Axtens a écrit :
> > > Hook into vmalloc and vmap, and dynamically allocate real shadow
> > > memory to back the mappings.
> > >
> > > Most mappings in vmalloc space are small, requiring less than a full
> > > page of shadow space. Allocating a full shadow page per mapping would
> > > therefore be wasteful. Furthermore, to ensure that different mappings
> > > use different shadow pages, mappings would have to be aligned to
> > > KASAN_SHADOW_SCALE_SIZE * PAGE_SIZE.
> > >
> > > Instead, share backing space across multiple mappings. Allocate
> > > a backing page the first time a mapping in vmalloc space uses a
> > > particular page of the shadow region. Keep this page around
> > > regardless of whether the mapping is later freed - in the mean time
> > > the page could have become shared by another vmalloc mapping.
> > >
> > > This can in theory lead to unbounded memory growth, but the vmalloc
> > > allocator is pretty good at reusing addresses, so the practical memory
> > > usage grows at first but then stays fairly stable.
> >
> > I guess people having gigabytes of memory don't mind, but I'm concerned
> > about tiny targets with very little amount of memory. I have boards with as
> > little as 32Mbytes of RAM. The shadow region for the linear space already
> > takes one eighth of the RAM. I'd rather avoid keeping unused shadow pages
> > busy.
>
> I think this depends on how much shadow would be in constant use vs what
> would get left unused. If the amount in constant use is sufficiently
> large (or the residue is sufficiently small), then it may not be
> worthwhile to support KASAN_VMALLOC on such small systems.
>
> > Each page of shadow memory represent 8 pages of real memory. Could we use
> > page_ref to count how many pieces of a shadow page are used so that we can
> > free it when the ref count decreases to 0.
> >
> > > This requires architecture support to actually use: arches must stop
> > > mapping the read-only zero page over portion of the shadow region that
> > > covers the vmalloc space and instead leave it unmapped.
> >
> > Why 'must' ? Couldn't we switch back and forth from the zero page to real
> > page on demand ?
> >
> > If the zero page is not mapped for unused vmalloc space, bad memory accesses
> > will Oops on the shadow memory access instead of Oopsing on the real bad
> > access, making it more difficult to locate and identify the issue.
>
> I agree this isn't nice, though FWIW this can already happen today for
> bad addresses that fall outside of the usual kernel address space. We
> could make the !KASAN_INLINE checks resilient to this by using
> probe_kernel_read() to check the shadow, and treating unmapped shadow as
> poison.

Could we instead modify the page fault handlers to detect this case
and print a useful message?


Re: [PATCH 6/6] arm64: document the choice of page attributes for pgprot_dmacoherent

2019-08-16 Thread Mark Rutland
On Fri, Aug 16, 2019 at 06:31:18PM +0100, Will Deacon wrote:
> Hi Christoph,
> 
> Thanks for spinning this into a patch.
> 
> On Fri, Aug 16, 2019 at 09:07:54AM +0200, Christoph Hellwig wrote:
> > Based on an email from Will Deacon.
> > 
> > Signed-off-by: Christoph Hellwig 
> > ---
> >  arch/arm64/include/asm/pgtable.h | 8 
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/arch/arm64/include/asm/pgtable.h 
> > b/arch/arm64/include/asm/pgtable.h
> > index 6700371227d1..6ff221d9a631 100644
> > --- a/arch/arm64/include/asm/pgtable.h
> > +++ b/arch/arm64/include/asm/pgtable.h
> > @@ -435,6 +435,14 @@ static inline pmd_t pmd_mkdevmap(pmd_t pmd)
> > __pgprot_modify(prot, PTE_ATTRINDX_MASK, PTE_ATTRINDX(MT_NORMAL_NC) | 
> > PTE_PXN | PTE_UXN)
> >  #define pgprot_device(prot) \
> > __pgprot_modify(prot, PTE_ATTRINDX_MASK, PTE_ATTRINDX(MT_DEVICE_nGnRE) 
> > | PTE_PXN | PTE_UXN)
> > +/*
> > + * DMA allocations for non-coherent devices use what the Arm architecture 
> > calls
> > + * "Normal non-cacheable" memory, which permits speculation, unaligned 
> > accesses
> > + * and merging of writes.  This is different from "Strongly Ordered" memory
> > + * which is intended for MMIO and thus forbids speculation, preserves 
> > access
> > + * size, requires strict alignment and also forces write responses to come 
> > from
> > + * the endpoint.
> > + */
> 
> Mind if I tweak the second sentence to be:
> 
>   This is different from "Device-nGnR[nE]" memory which is intended for MMIO
>   and thus forbids speculation, preserves access size, requires strict
>   alignment and can also force write responses to come from the endpoint.
> 
> ? It's a small change, but it better fits with the arm64 terminology
> ("strongly ordered" is no longer used in the architecture).
> 
> If you're happy with that, I can make the change and queue this patch
> for 5.4.

FWIW, with that wording:

Acked-by: Mark Rutland 

Mark.


Re: [PATCH 6/6] arm64: document the choice of page attributes for pgprot_dmacoherent

2019-08-16 Thread Will Deacon
Hi Christoph,

Thanks for spinning this into a patch.

On Fri, Aug 16, 2019 at 09:07:54AM +0200, Christoph Hellwig wrote:
> Based on an email from Will Deacon.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/arm64/include/asm/pgtable.h | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/pgtable.h 
> b/arch/arm64/include/asm/pgtable.h
> index 6700371227d1..6ff221d9a631 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -435,6 +435,14 @@ static inline pmd_t pmd_mkdevmap(pmd_t pmd)
>   __pgprot_modify(prot, PTE_ATTRINDX_MASK, PTE_ATTRINDX(MT_NORMAL_NC) | 
> PTE_PXN | PTE_UXN)
>  #define pgprot_device(prot) \
>   __pgprot_modify(prot, PTE_ATTRINDX_MASK, PTE_ATTRINDX(MT_DEVICE_nGnRE) 
> | PTE_PXN | PTE_UXN)
> +/*
> + * DMA allocations for non-coherent devices use what the Arm architecture 
> calls
> + * "Normal non-cacheable" memory, which permits speculation, unaligned 
> accesses
> + * and merging of writes.  This is different from "Strongly Ordered" memory
> + * which is intended for MMIO and thus forbids speculation, preserves access
> + * size, requires strict alignment and also forces write responses to come 
> from
> + * the endpoint.
> + */

Mind if I tweak the second sentence to be:

  This is different from "Device-nGnR[nE]" memory which is intended for MMIO
  and thus forbids speculation, preserves access size, requires strict
  alignment and can also force write responses to come from the endpoint.

? It's a small change, but it better fits with the arm64 terminology
("strongly ordered" is no longer used in the architecture).

If you're happy with that, I can make the change and queue this patch
for 5.4.

Thanks,

Will


Re: [PATCH v4 1/3] kasan: support backing vmalloc space with real shadow memory

2019-08-16 Thread Mark Rutland
Hi Christophe,

On Fri, Aug 16, 2019 at 09:47:00AM +0200, Christophe Leroy wrote:
> Le 15/08/2019 à 02:16, Daniel Axtens a écrit :
> > Hook into vmalloc and vmap, and dynamically allocate real shadow
> > memory to back the mappings.
> > 
> > Most mappings in vmalloc space are small, requiring less than a full
> > page of shadow space. Allocating a full shadow page per mapping would
> > therefore be wasteful. Furthermore, to ensure that different mappings
> > use different shadow pages, mappings would have to be aligned to
> > KASAN_SHADOW_SCALE_SIZE * PAGE_SIZE.
> > 
> > Instead, share backing space across multiple mappings. Allocate
> > a backing page the first time a mapping in vmalloc space uses a
> > particular page of the shadow region. Keep this page around
> > regardless of whether the mapping is later freed - in the mean time
> > the page could have become shared by another vmalloc mapping.
> > 
> > This can in theory lead to unbounded memory growth, but the vmalloc
> > allocator is pretty good at reusing addresses, so the practical memory
> > usage grows at first but then stays fairly stable.
> 
> I guess people having gigabytes of memory don't mind, but I'm concerned
> about tiny targets with very little amount of memory. I have boards with as
> little as 32Mbytes of RAM. The shadow region for the linear space already
> takes one eighth of the RAM. I'd rather avoid keeping unused shadow pages
> busy.

I think this depends on how much shadow would be in constant use vs what
would get left unused. If the amount in constant use is sufficiently
large (or the residue is sufficiently small), then it may not be
worthwhile to support KASAN_VMALLOC on such small systems.

> Each page of shadow memory represent 8 pages of real memory. Could we use
> page_ref to count how many pieces of a shadow page are used so that we can
> free it when the ref count decreases to 0.
> 
> > This requires architecture support to actually use: arches must stop
> > mapping the read-only zero page over portion of the shadow region that
> > covers the vmalloc space and instead leave it unmapped.
> 
> Why 'must' ? Couldn't we switch back and forth from the zero page to real
> page on demand ?
>
> If the zero page is not mapped for unused vmalloc space, bad memory accesses
> will Oops on the shadow memory access instead of Oopsing on the real bad
> access, making it more difficult to locate and identify the issue.

I agree this isn't nice, though FWIW this can already happen today for
bad addresses that fall outside of the usual kernel address space. We
could make the !KASAN_INLINE checks resilient to this by using
probe_kernel_read() to check the shadow, and treating unmapped shadow as
poison.

It's also worth noting that flipping back and forth isn't generally safe
unless going via an invalid table entry, so there'd still be windows
where a bad access might not have shadow mapped.

We'd need to reuse the common p4d/pud/pmd/pte tables for unallocated
regions, or the tables alone would consume significant amounts of memory
(e..g ~32GiB for arm64 defconfig), and thus we'd need to be able to
switch all levels between pgd and pte, which is much more complicated.

I strongly suspect that the additional complexity will outweigh the
benefit.

[...]

> > +#ifdef CONFIG_KASAN_VMALLOC
> > +static int kasan_populate_vmalloc_pte(pte_t *ptep, unsigned long addr,
> > + void *unused)
> > +{
> > +   unsigned long page;
> > +   pte_t pte;
> > +
> > +   if (likely(!pte_none(*ptep)))
> > +   return 0;
> 
> Prior to this, the zero shadow area should be mapped, and the test should
> be:
> 
> if (likely(pte_pfn(*ptep) != PHYS_PFN(__pa(kasan_early_shadow_page
>   return 0;

As above, this would need a more comprehensive redesign, so I don't
think it's worth going into that level of nit here. :)

If we do try to use common shadow for unallocate VA ranges, it probably
makes sense to have a common poison page that we can use, so that we can
report vmalloc-out-of-bounfds.

Thanks,
Mark.


[PATCH v5 23/23] PCI: pciehp: movable BARs: Trigger a domain rescan on hp events

2019-08-16 Thread Sergey Miroshnichenko
With movable BARs, adding a hotplugged device is not local to its bridge
anymore, but it affects the whole domain: BARs, bridge windows and bus
numbers can be substantially rearranged. So instead of trying to fit the
new devices into preallocated reserved gaps, initiate a full domain rescan.

The pci_rescan_bus() covers all the operations of the replaced functions:
 - assigning new bus numbers, as the pci_hp_add_bridge() does it;
 - allocating BARs (pci_assign_unassigned_bridge_resources());
 - cofiguring MPS settings (pcie_bus_configure_settings());
 - binding devices to their drivers (pci_bus_add_devices()).

CC: Lukas Wunner 
Signed-off-by: Sergey Miroshnichenko 
---
 drivers/pci/hotplug/pciehp_pci.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/pci/hotplug/pciehp_pci.c b/drivers/pci/hotplug/pciehp_pci.c
index d17f3bf36f70..66c4e6d88fe3 100644
--- a/drivers/pci/hotplug/pciehp_pci.c
+++ b/drivers/pci/hotplug/pciehp_pci.c
@@ -58,6 +58,11 @@ int pciehp_configure_device(struct controller *ctrl)
goto out;
}
 
+   if (pci_movable_bars_enabled()) {
+   pci_rescan_bus(parent);
+   goto out;
+   }
+
for_each_pci_bridge(dev, parent)
pci_hp_add_bridge(dev);
 
-- 
2.21.0



[PATCH v5 22/23] PCI/portdrv: Declare support of movable BARs

2019-08-16 Thread Sergey Miroshnichenko
Switch's BARs are not used by the portdrv driver, but they are still
considered as immovable until the .rescan_prepare() and .rescan_done()
hooks are added. Add these hooks to increase chances to allocate new BARs.

Signed-off-by: Sergey Miroshnichenko 
---
 drivers/pci/pcie/portdrv_pci.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c
index 0a87091a0800..9dbddc7faaa7 100644
--- a/drivers/pci/pcie/portdrv_pci.c
+++ b/drivers/pci/pcie/portdrv_pci.c
@@ -197,6 +197,14 @@ static const struct pci_error_handlers 
pcie_portdrv_err_handler = {
.resume = pcie_portdrv_err_resume,
 };
 
+static void pcie_portdrv_rescan_prepare(struct pci_dev *pdev)
+{
+}
+
+static void pcie_portdrv_rescan_done(struct pci_dev *pdev)
+{
+}
+
 static struct pci_driver pcie_portdriver = {
.name   = "pcieport",
.id_table   = _pci_ids[0],
@@ -207,6 +215,9 @@ static struct pci_driver pcie_portdriver = {
 
.err_handler= _portdrv_err_handler,
 
+   .rescan_prepare = pcie_portdrv_rescan_prepare,
+   .rescan_done= pcie_portdrv_rescan_done,
+
.driver.pm  = PCIE_PORTDRV_PM_OPS,
 };
 
-- 
2.21.0



[PATCH v5 21/23] nvme-pci: Handle movable BARs

2019-08-16 Thread Sergey Miroshnichenko
Hotplugged devices can affect the existing ones by moving their BARs. The
PCI subsystem will inform the NVME driver about this by invoking the
.rescan_prepare() and .rescan_done() hooks, so the BARs can by re-mapped.

Tested under the "randrw" mode of the fio tool. Before the hotplugging:

  % sudo cat /proc/iomem
  ...
3fe8-3fe8007f : PCI Bus 0020:0b
  3fe8-3fe8007f : PCI Bus 0020:18
3fe8-3fe8000f : 0020:18:00.0
  3fe8-3fe8000f : nvme
3fe80010-3fe80017 : 0020:18:00.0
  ...

, then another NVME drive was hot-added, so BARs of the 0020:18:00.0 are
moved:

  % sudo cat /proc/iomem
...
3fe8-3fe800ff : PCI Bus 0020:0b
  3fe8-3fe8007f : PCI Bus 0020:10
3fe8-3fe83fff : 0020:10:00.0
  3fe8-3fe83fff : nvme
3fe80001-3fe80001 : 0020:10:00.0
  3fe80080-3fe800ff : PCI Bus 0020:18
3fe80080-3fe8008f : 0020:18:00.0
  3fe80080-3fe8008f : nvme
3fe80090-3fe80097 : 0020:18:00.0
...

During the rescanning, both READ and WRITE speeds drop to zero for a while
due to driver's pause, then restore.

Cc: linux-n...@lists.infradead.org
Cc: Christoph Hellwig 
Signed-off-by: Sergey Miroshnichenko 
---
 drivers/nvme/host/pci.c | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index db160cee42ad..a805d80082ca 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1645,7 +1645,7 @@ static int nvme_remap_bar(struct nvme_dev *dev, unsigned 
long size)
 {
struct pci_dev *pdev = to_pci_dev(dev->dev);
 
-   if (size <= dev->bar_mapped_size)
+   if (dev->bar && size <= dev->bar_mapped_size)
return 0;
if (size > pci_resource_len(pdev, 0))
return -ENOMEM;
@@ -2980,6 +2980,23 @@ static void nvme_error_resume(struct pci_dev *pdev)
flush_work(>ctrl.reset_work);
 }
 
+static void nvme_rescan_prepare(struct pci_dev *pdev)
+{
+   struct nvme_dev *dev = pci_get_drvdata(pdev);
+
+   nvme_dev_disable(dev, false);
+   nvme_dev_unmap(dev);
+   dev->bar = NULL;
+}
+
+static void nvme_rescan_done(struct pci_dev *pdev)
+{
+   struct nvme_dev *dev = pci_get_drvdata(pdev);
+
+   nvme_dev_map(dev);
+   nvme_reset_ctrl_sync(>ctrl);
+}
+
 static const struct pci_error_handlers nvme_err_handler = {
.error_detected = nvme_error_detected,
.slot_reset = nvme_slot_reset,
@@ -3049,6 +3066,8 @@ static struct pci_driver nvme_driver = {
 #endif
.sriov_configure = pci_sriov_configure_simple,
.err_handler= _err_handler,
+   .rescan_prepare = nvme_rescan_prepare,
+   .rescan_done= nvme_rescan_done,
 };
 
 static int __init nvme_init(void)
-- 
2.21.0



[PATCH v5 20/23] PCI: hotplug: movable BARs: Enable the feature by default

2019-08-16 Thread Sergey Miroshnichenko
This is the last patch in the series which implements the essentials of the
Movable BARs feature, so it is turned by default now. Tested on:

 - x86_64 with "pci=realloc,assign-busses,use_crs,pcie_bus_peer2peer"
   command line argument;
 - POWER8 PowerNV+PHB3 ppc64le with "pci=realloc,pcie_bus_peer2peer".

In case of problems it is still can be overridden by the following command
line option:

pcie_movable_bars=off

Signed-off-by: Sergey Miroshnichenko 
---
 drivers/pci/pci-driver.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index d11909e79263..a8124e47bf6e 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -1688,8 +1688,6 @@ static int __init pci_driver_init(void)
 {
int ret;
 
-   pci_add_flags(PCI_IMMOVABLE_BARS);
-
ret = bus_register(_bus_type);
if (ret)
return ret;
-- 
2.21.0



[PATCH v5 19/23] PCI: hotplug: Configure MPS for hot-added bridges during bus rescan

2019-08-16 Thread Sergey Miroshnichenko
Assure that MPS settings are set up for bridges which are discovered
during manually triggered rescan via sysfs. This sequence of bridge
init (using pci_rescan_bus()) will be used for pciehp hot-add events
when BARs are movable.

Signed-off-by: Sergey Miroshnichenko 
---
 drivers/pci/probe.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 5f52a19738aa..4bb10d27cb3a 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -3688,7 +3688,7 @@ static void pci_reassign_root_bus_resources(struct 
pci_bus *root)
 unsigned int pci_rescan_bus(struct pci_bus *bus)
 {
unsigned int max;
-   struct pci_bus *root = bus;
+   struct pci_bus *root = bus, *child;
 
while (!pci_is_root_bus(root))
root = root->parent;
@@ -3708,6 +3708,9 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
pci_assign_unassigned_bus_resources(bus);
}
 
+   list_for_each_entry(child, >children, node)
+   pcie_bus_configure_settings(child);
+
pci_bus_add_devices(bus);
 
return max;
-- 
2.21.0



[PATCH v5 18/23] powerpc/pci: Handle BAR movement

2019-08-16 Thread Sergey Miroshnichenko
Add pcibios_rescan_prepare()/_done() hooks for the powerpc platform. Now if
the device's driver supports movable BARs, pcibios_rescan_prepare() will be
called after the device is stopped, and pcibios_rescan_done() - before it
resumes. There are no memory requests to this device between the hooks, so
it it safe to rebuild the EEH address cache during that.

CC: Oliver O'Halloran 
Signed-off-by: Sergey Miroshnichenko 
---
 arch/powerpc/kernel/pci-hotplug.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/powerpc/kernel/pci-hotplug.c 
b/arch/powerpc/kernel/pci-hotplug.c
index 0b0cf8168b47..18cf13bba228 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -144,3 +144,13 @@ void pci_hp_add_devices(struct pci_bus *bus)
pcibios_finish_adding_to_bus(bus);
 }
 EXPORT_SYMBOL_GPL(pci_hp_add_devices);
+
+void pcibios_rescan_prepare(struct pci_dev *pdev)
+{
+   eeh_addr_cache_rmv_dev(pdev);
+}
+
+void pcibios_rescan_done(struct pci_dev *pdev)
+{
+   eeh_addr_cache_insert_dev(pdev);
+}
-- 
2.21.0



[PATCH v5 15/23] PCI: hotplug: movable BARs: Assign fixed and immovable BARs before others

2019-08-16 Thread Sergey Miroshnichenko
Reassign resources during rescan in two steps: first the fixed/immovable
BARs and bridge windows that have fixed areas, so the movable ones will not
steal these reserved areas; then the rest - so the movable BARs will divide
the rest of the space.

With this change, pci_assign_resource() is now able to assign all types of
BARs, so the pdev_assign_fixed_resources() became unused and thus removed.

Signed-off-by: Sergey Miroshnichenko 
---
 drivers/pci/pci.h   |  2 ++
 drivers/pci/setup-bus.c | 79 -
 drivers/pci/setup-res.c |  8 +++--
 3 files changed, 55 insertions(+), 34 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 12add575faf1..e1fcc46f9c40 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -260,6 +260,8 @@ void pci_disable_bridge_window(struct pci_dev *dev);
 
 bool pci_dev_movable_bars_supported(struct pci_dev *dev);
 
+int assign_fixed_resource_on_bus(struct pci_bus *b, struct resource *r);
+
 /* PCIe link information */
 #define PCIE_SPEED2STR(speed) \
((speed) == PCIE_SPEED_16_0GT ? "16 GT/s" : \
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 6f12411357f3..c7b7e30c6284 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -38,6 +38,15 @@ struct pci_dev_resource {
unsigned long flags;
 };
 
+enum assign_step {
+   assign_fixed_resources,
+   assign_float_resources,
+};
+
+static void _assign_requested_resources_sorted(struct list_head *head,
+  struct list_head *fail_head,
+  enum assign_step step);
+
 static void free_list(struct list_head *head)
 {
struct pci_dev_resource *dev_res, *tmp;
@@ -278,19 +287,48 @@ static void reassign_resources_sorted(struct list_head 
*realloc_head,
  */
 static void assign_requested_resources_sorted(struct list_head *head,
 struct list_head *fail_head)
+{
+   _assign_requested_resources_sorted(head, fail_head, 
assign_fixed_resources);
+   _assign_requested_resources_sorted(head, fail_head, 
assign_float_resources);
+}
+
+static void _assign_requested_resources_sorted(struct list_head *head,
+  struct list_head *fail_head,
+  enum assign_step step)
 {
struct resource *res;
struct pci_dev_resource *dev_res;
int idx;
 
list_for_each_entry(dev_res, head, list) {
+   bool is_fixed = false;
+
if (!pci_dev_bars_enabled(dev_res->dev))
continue;
 
res = dev_res->res;
+   if (!resource_size(res))
+   continue;
+
idx = res - _res->dev->resource[0];
-   if (resource_size(res) &&
-   pci_assign_resource(dev_res->dev, idx)) {
+
+   if (idx < PCI_BRIDGE_RESOURCES) {
+   is_fixed = (res->flags & IORESOURCE_PCI_FIXED) ||
+   !pci_dev_movable_bars_supported(dev_res->dev);
+   } else {
+   int b_res_idx = pci_get_bridge_resource_idx(res);
+   struct resource *fixed_res =
+   
_res->dev->subordinate->immovable_range[b_res_idx];
+
+   is_fixed = (fixed_res->start < fixed_res->end);
+   }
+
+   if (assign_fixed_resources == step && !is_fixed)
+   continue;
+   else if (assign_float_resources == step && is_fixed)
+   continue;
+
+   if (pci_assign_resource(dev_res->dev, idx)) {
if (fail_head) {
/*
 * If the failed resource is a ROM BAR and
@@ -1336,7 +1374,7 @@ void pci_bus_size_bridges(struct pci_bus *bus)
 }
 EXPORT_SYMBOL(pci_bus_size_bridges);
 
-static void assign_fixed_resource_on_bus(struct pci_bus *b, struct resource *r)
+int assign_fixed_resource_on_bus(struct pci_bus *b, struct resource *r)
 {
int i;
struct resource *parent_r;
@@ -1353,35 +1391,14 @@ static void assign_fixed_resource_on_bus(struct pci_bus 
*b, struct resource *r)
!(r->flags & IORESOURCE_PREFETCH))
continue;
 
-   if (resource_contains(parent_r, r))
-   request_resource(parent_r, r);
-   }
-}
-
-/*
- * Try to assign any resources marked as IORESOURCE_PCI_FIXED, as they are
- * skipped by pbus_assign_resources_sorted().
- */
-static void pdev_assign_fixed_resources(struct pci_dev *dev)
-{
-   int i;
-
-   for (i = 0; i <  PCI_NUM_RESOURCES; i++) {
-   struct pci_bus *b;
-   struct resource *r = >resource[i];
-
-   if (r->parent || !(r->flags & IORESOURCE_PCI_FIXED) ||
-   !(r->flags & (IORESOURCE_IO | 

[PATCH v5 17/23] powerpc/pci: Fix crash with enabled movable BARs

2019-08-16 Thread Sergey Miroshnichenko
Add a check for the UNSET resource flag to skip the released BARs

CC: Alexey Kardashevskiy 
Signed-off-by: Sergey Miroshnichenko 
---
 arch/powerpc/platforms/powernv/pci-ioda.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index d8080558d020..362eac42f463 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -2986,7 +2986,8 @@ static void pnv_ioda_setup_pe_res(struct pnv_ioda_pe *pe,
int index;
int64_t rc;
 
-   if (!res || !res->flags || res->start > res->end)
+   if (!res || !res->flags || res->start > res->end ||
+   (res->flags & IORESOURCE_UNSET))
return;
 
if (res->flags & IORESOURCE_IO) {
-- 
2.21.0



[PATCH v5 16/23] PCI: hotplug: movable BARs: Don't reserve IO/mem bus space

2019-08-16 Thread Sergey Miroshnichenko
A hotplugged bridge with many hotplug-capable ports may request
reserving more IO space than the machine has. This could be overridden
with the "hpiosize=" kernel argument though.

But when BARs are movable, there are no need to reserve space anymore:
new BARs are allocated not from reserved gaps, but via rearranging the
existing BARs. Requesting a precise amount of space for bridge windows
increases the chances of adding the new bridge successfully.

Signed-off-by: Sergey Miroshnichenko 
---
 drivers/pci/setup-bus.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index c7b7e30c6284..7d64ec8e7088 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1287,7 +1287,7 @@ void __pci_bus_size_bridges(struct pci_bus *bus, struct 
list_head *realloc_head)
 
case PCI_HEADER_TYPE_BRIDGE:
pci_bridge_check_ranges(bus);
-   if (bus->self->is_hotplug_bridge) {
+   if (bus->self->is_hotplug_bridge && 
!pci_movable_bars_enabled()) {
additional_io_size  = pci_hotplug_io_size;
additional_mem_size = pci_hotplug_mem_size;
}
-- 
2.21.0



[PATCH v5 13/23] PCI: Make sure bridge windows include their fixed BARs

2019-08-16 Thread Sergey Miroshnichenko
When the time comes to select a start address for the bridge window during
the root bus rescan, it should be not just a lowest possible address: this
window must cover all the underlying fixed and immovable BARs. The lowest
address that satisfies this requirement is the .realloc_range field of
struct pci_bus, which is calculated during the preparation to the rescan.

Signed-off-by: Sergey Miroshnichenko 
---
 drivers/pci/bus.c   |  2 +-
 drivers/pci/setup-res.c | 28 ++--
 2 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 495059d923f7..7aae830751e9 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -192,7 +192,7 @@ static int pci_bus_alloc_from_region(struct pci_bus *bus, 
struct resource *res,
 * this is an already-configured bridge window, its start
 * overrides "min".
 */
-   if (avail.start)
+   if (min_used < avail.start)
min_used = avail.start;
 
max = avail.end;
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index 732d18f60f1b..7357bcc12a53 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -248,9 +248,20 @@ static int __pci_assign_resource(struct pci_bus *bus, 
struct pci_dev *dev,
struct resource *res = dev->resource + resno;
resource_size_t min;
int ret;
+   resource_size_t start = (resource_size_t)-1;
+   resource_size_t end = 0;
 
min = (res->flags & IORESOURCE_IO) ? PCIBIOS_MIN_IO : PCIBIOS_MIN_MEM;
 
+   if (dev->subordinate && resno >= PCI_BRIDGE_RESOURCES) {
+   struct pci_bus *child_bus = dev->subordinate;
+   int b_resno = resno - PCI_BRIDGE_RESOURCES;
+   struct resource *immovable_range = 
_bus->immovable_range[b_resno];
+
+   if (immovable_range->start < immovable_range->end)
+   min = child_bus->realloc_range[b_resno].start;
+   }
+
/*
 * First, try exact prefetching match.  Even if a 64-bit
 * prefetchable bridge window is below 4GB, we can't put a 32-bit
@@ -262,7 +273,7 @@ static int __pci_assign_resource(struct pci_bus *bus, 
struct pci_dev *dev,
 IORESOURCE_PREFETCH | IORESOURCE_MEM_64,
 pcibios_align_resource, dev);
if (ret == 0)
-   return 0;
+   goto check_fixed;
 
/*
 * If the prefetchable window is only 32 bits wide, we can put
@@ -274,7 +285,7 @@ static int __pci_assign_resource(struct pci_bus *bus, 
struct pci_dev *dev,
 IORESOURCE_PREFETCH,
 pcibios_align_resource, dev);
if (ret == 0)
-   return 0;
+   goto check_fixed;
}
 
/*
@@ -287,6 +298,19 @@ static int __pci_assign_resource(struct pci_bus *bus, 
struct pci_dev *dev,
ret = pci_bus_alloc_resource(bus, res, size, align, min, 0,
 pcibios_align_resource, dev);
 
+check_fixed:
+   if (ret == 0 && start < end) {
+   if (res->start > start || res->end < end) {
+   dev_err(>dev, "fixed area 0x%llx-0x%llx for %s 
doesn't fit in the allocated %pR (0x%llx-0x%llx)",
+   (unsigned long long)start, (unsigned long 
long)end,
+   dev_name(>dev),
+   res, (unsigned long long)res->start,
+   (unsigned long long)res->end);
+   release_resource(res);
+   return -1;
+   }
+   }
+
return ret;
 }
 
-- 
2.21.0



[PATCH v5 14/23] PCI: Fix assigning the fixed prefetchable resources

2019-08-16 Thread Sergey Miroshnichenko
Allow matching IORESOURCE_PCI_FIXED prefetchable BARs to non-prefetchable
windows, so they follow the same rules as immovable BARs.

Signed-off-by: Sergey Miroshnichenko 
---
 drivers/pci/setup-bus.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 586aaa9578b2..6f12411357f3 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1340,15 +1340,20 @@ static void assign_fixed_resource_on_bus(struct pci_bus 
*b, struct resource *r)
 {
int i;
struct resource *parent_r;
-   unsigned long mask = IORESOURCE_IO | IORESOURCE_MEM |
-IORESOURCE_PREFETCH;
+   unsigned long mask = IORESOURCE_TYPE_BITS;
 
pci_bus_for_each_resource(b, parent_r, i) {
if (!parent_r)
continue;
 
-   if ((r->flags & mask) == (parent_r->flags & mask) &&
-   resource_contains(parent_r, r))
+   if ((r->flags & mask) != (parent_r->flags & mask))
+   continue;
+
+   if (parent_r->flags & IORESOURCE_PREFETCH &&
+   !(r->flags & IORESOURCE_PREFETCH))
+   continue;
+
+   if (resource_contains(parent_r, r))
request_resource(parent_r, r);
}
 }
-- 
2.21.0



[PATCH v5 12/23] PCI: hotplug: movable BARs: Compute limits for relocated bridge windows

2019-08-16 Thread Sergey Miroshnichenko
With enabled movable BARs, bridge windows are recalculated during each pci
rescan. Some of the BARs below the bridge may be fixed/immovable: these
areas are represented by the .immovable_range field in struct pci_bus.

If a bridge window size is equal to its immovable range, it can only be
assigned to the start of this range. But if a bridge window size is larger,
and this difference in size is denoted as "delta", the window can start
from (immovable_range.start - delta) to (immovable_range.start), and it can
end from (immovable_range.end) to (immovable_range.end + delta). This range
(the new .realloc_range field in struct pci_bus) must then be compared with
immovable ranges of neighbouring bridges to guarantee no intersections.

This patch only calculates valid ranges for reallocated bridges during pci
rescan, and the next one will make use of these values during allocation.

Signed-off-by: Sergey Miroshnichenko 
---
 drivers/pci/setup-bus.c | 67 +
 include/linux/pci.h |  6 
 2 files changed, 73 insertions(+)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 420510a1a257..586aaa9578b2 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1819,6 +1819,72 @@ static enum enable_type pci_realloc_detect(struct 
pci_bus *bus,
 }
 #endif
 
+/*
+ * Calculate the address margins where the bridge windows may be allocated to 
fit all
+ * the fixed and immovable BARs beneath.
+ */
+static void pci_bus_update_realloc_range(struct pci_bus *bus)
+{
+   struct pci_dev *dev;
+   struct pci_bus *parent = bus->parent;
+   int idx;
+
+   list_for_each_entry(dev, >devices, bus_list)
+   if (dev->subordinate)
+   pci_bus_update_realloc_range(dev->subordinate);
+
+   if (!parent || !bus->self)
+   return;
+
+   for (idx = 0; idx < PCI_BRIDGE_RESOURCE_NUM; ++idx) {
+   struct resource *immovable_range = >immovable_range[idx];
+   resource_size_t window_size = resource_size(bus->resource[idx]);
+   resource_size_t realloc_start, realloc_end;
+
+   bus->realloc_range[idx].start = 0;
+   bus->realloc_range[idx].end = 0;
+
+   /* Check if there any immovable BARs under the bridge */
+   if (immovable_range->start >= immovable_range->end)
+   continue;
+
+   /* The lowest possible address where the bridge window can 
start */
+   realloc_start = immovable_range->end - window_size + 1;
+   /* The highest possible address where the bridge window can end 
*/
+   realloc_end = immovable_range->start + window_size - 1;
+
+   if (realloc_start > immovable_range->start)
+   realloc_start = immovable_range->start;
+
+   if (realloc_end < immovable_range->end)
+   realloc_end = immovable_range->end;
+
+   /*
+* Check that realloc range doesn't intersect with hard fixed 
ranges
+* of neighboring bridges
+*/
+   list_for_each_entry(dev, >devices, bus_list) {
+   struct pci_bus *neighbor = dev->subordinate;
+   struct resource *n_imm_range;
+
+   if (!neighbor || neighbor == bus)
+   continue;
+
+   n_imm_range = >immovable_range[idx];
+
+   if (n_imm_range->start >= n_imm_range->end)
+   continue;
+
+   if (n_imm_range->end < immovable_range->start &&
+   n_imm_range->end > realloc_start)
+   realloc_start = n_imm_range->end;
+   }
+
+   bus->realloc_range[idx].start = realloc_start;
+   bus->realloc_range[idx].end = realloc_end;
+   }
+}
+
 /*
  * First try will not touch PCI bridge res.
  * Second and later try will clear small leaf bridge res.
@@ -1838,6 +1904,7 @@ void pci_assign_unassigned_root_bus_resources(struct 
pci_bus *bus)
 
if (pci_movable_bars_enabled()) {
__pci_bus_size_bridges(bus, NULL);
+   pci_bus_update_realloc_range(bus);
__pci_bus_assign_resources(bus, NULL, NULL);
 
goto dump;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index efafbf816fe6..bf6638cf2525 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -587,6 +587,12 @@ struct pci_bus {
 */
struct resource immovable_range[PCI_BRIDGE_RESOURCE_NUM];
 
+   /*
+* Acceptable address range, where the bridge window may reside, 
considering its
+* size, so it will cover all the fixed and immovable BARs below.
+*/
+   struct resource realloc_range[PCI_BRIDGE_RESOURCE_NUM];
+
struct pci_ops  *ops;   /* Configuration access functions 

[PATCH v5 11/23] PCI: hotplug: movable BARs: Calculate immovable parts of bridge windows

2019-08-16 Thread Sergey Miroshnichenko
When movable BARs are enabled, and if a bridge contains a device with fixed
(IORESOURCE_PCI_FIXED) or immovable BARs, the corresponing windows can't be
moved too far away from their original positions - they must still contain
all the fixed/immovable BARs, like that:

  1) Window position before a bus rescan:

  | <--root bridge window--> |
  |  |
  | | <-- bridge window--> | |
  | | movable BARs | **fixed BAR** | |

  2) Possible valid outcome after rescan and move:

  | <--root bridge window--> |
  |  |
  || <-- bridge window--> |  |
  || **fixed BAR** | Movable BARs |  |

An immovable area of a bridge (separare for IO, MEM and MEM64 window types)
is a range that covers all the fixed and immovable BARs of direct children,
and all the fixed area of children bridges:

  | <--root bridge window--> |
  |  |
  |  | <--  bridge window level 1--> |   |
  |  |  immovable area of this bridge window |   |
  |  |   |   |
  |  | **fixed BAR**  | <--  bridge window level 2--> | BARs |   |
  |  || * fixed area of this bridge * |  |   |
  |  ||   |  |   |
  |  || ***fixed BAR*** |   | ***fixed BAR*** |  |   |

To store these areas, the .immovable_range field has been added to struct
pci_bus. It is filled recursively from leaves to the root before a rescan.

Also make pbus_size_io() and pbus_size_mem() return their usual result OR
the size of an immovable range of according type, depending on which one is
larger.

Signed-off-by: Sergey Miroshnichenko 
---
 drivers/pci/pci.h   | 14 +++
 drivers/pci/probe.c | 88 +
 drivers/pci/setup-bus.c | 17 
 include/linux/pci.h |  6 +++
 4 files changed, 125 insertions(+)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 53249cbc21b6..12add575faf1 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -371,6 +371,20 @@ static inline bool pci_dev_is_disconnected(const struct 
pci_dev *dev)
return dev->error_state == pci_channel_io_perm_failure;
 }
 
+static inline int pci_get_bridge_resource_idx(struct resource *r)
+{
+   int idx = 1;
+
+   if (r->flags & IORESOURCE_IO)
+   idx = 0;
+   else if (!(r->flags & IORESOURCE_PREFETCH))
+   idx = 1;
+   else if (r->flags & IORESOURCE_MEM_64)
+   idx = 2;
+
+   return idx;
+}
+
 /* pci_dev priv_flags */
 #define PCI_DEV_ADDED 0
 #define PCI_DEV_DISABLED_BARS 1
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index bf0a7d1c5d09..5f52a19738aa 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -550,6 +550,7 @@ void pci_read_bridge_bases(struct pci_bus *child)
 static struct pci_bus *pci_alloc_bus(struct pci_bus *parent)
 {
struct pci_bus *b;
+   int idx;
 
b = kzalloc(sizeof(*b), GFP_KERNEL);
if (!b)
@@ -566,6 +567,11 @@ static struct pci_bus *pci_alloc_bus(struct pci_bus 
*parent)
if (parent)
b->domain_nr = parent->domain_nr;
 #endif
+   for (idx = 0; idx < PCI_BRIDGE_RESOURCE_NUM; ++idx) {
+   b->immovable_range[idx].start = 0;
+   b->immovable_range[idx].end = 0;
+   }
+
return b;
 }
 
@@ -3512,6 +3518,87 @@ static void pci_setup_bridges(struct pci_bus *bus)
pci_setup_bridge(bus);
 }
 
+static void pci_bus_update_immovable_range(struct pci_bus *bus)
+{
+   struct pci_dev *dev;
+   int idx;
+   resource_size_t start, end;
+
+   for (idx = 0; idx < PCI_BRIDGE_RESOURCE_NUM; ++idx) {
+   bus->immovable_range[idx].start = 0;
+   bus->immovable_range[idx].end = 0;
+   }
+
+   list_for_each_entry(dev, >devices, bus_list)
+   if (dev->subordinate)
+   pci_bus_update_immovable_range(dev->subordinate);
+
+   list_for_each_entry(dev, >devices, bus_list) {
+   int i;
+   bool dev_is_movable = pci_dev_movable_bars_supported(dev);
+   struct pci_bus *child = dev->subordinate;
+
+   for (i = 0; i < PCI_BRIDGE_RESOURCES; ++i) {
+   struct resource *r = >resource[i];
+
+   if (!r->flags || (r->flags & IORESOURCE_UNSET) || 
!r->parent)
+   continue;
+
+   if (!dev_is_movable 

[PATCH v5 09/23] PCI: Prohibit assigning BARs and bridge windows to non-direct parents

2019-08-16 Thread Sergey Miroshnichenko
When movable BARs are enabled, the feature of resource relocating from
commit 2bbc6942273b5 ("PCI : ability to relocate assigned pci-resources")
is not used. Instead, inability to assign a resource is used as a signal
to retry BAR assignment with other configuration of bridge windows.

Signed-off-by: Sergey Miroshnichenko 
---
 drivers/pci/setup-bus.c |  2 ++
 drivers/pci/setup-res.c | 12 
 2 files changed, 14 insertions(+)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 2c250efca512..aee330047121 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1356,6 +1356,8 @@ static void pdev_assign_fixed_resources(struct pci_dev 
*dev)
while (b && !r->parent) {
assign_fixed_resource_on_bus(b, r);
b = b->parent;
+   if (!r->parent && pci_movable_bars_enabled())
+   break;
}
}
 }
diff --git a/drivers/pci/setup-res.c b/drivers/pci/setup-res.c
index d8ca40a97693..732d18f60f1b 100644
--- a/drivers/pci/setup-res.c
+++ b/drivers/pci/setup-res.c
@@ -298,6 +298,18 @@ static int _pci_assign_resource(struct pci_dev *dev, int 
resno,
 
bus = dev->bus;
while ((ret = __pci_assign_resource(bus, dev, resno, size, min_align))) 
{
+   if (pci_movable_bars_enabled()) {
+   if (resno >= PCI_BRIDGE_RESOURCES &&
+   resno <= PCI_BRIDGE_RESOURCE_END) {
+   struct resource *res = dev->resource + resno;
+
+   res->start = 0;
+   res->end = 0;
+   res->flags = 0;
+   }
+   break;
+   }
+
if (!bus->parent || !bus->self->transparent)
break;
bus = bus->parent;
-- 
2.21.0



[PATCH v5 10/23] PCI: hotplug: movable BARs: Try to assign unassigned resources only once

2019-08-16 Thread Sergey Miroshnichenko
With enabled BAR movement, BARs and bridge windows can only be assigned to
their direct parents, so there can be only one variant of resource tree,
thus every retry within the pci_assign_unassigned_root_bus_resources() will
result in the same tree, and it is enough to try just once.

In case of failures the pci_reassign_root_bus_resources() disables BARs for
one of the hotplugged devices and tries the assignment again.

Signed-off-by: Sergey Miroshnichenko 
---
 drivers/pci/setup-bus.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index aee330047121..33f709095675 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1819,6 +1819,13 @@ void pci_assign_unassigned_root_bus_resources(struct 
pci_bus *bus)
int pci_try_num = 1;
enum enable_type enable_local;
 
+   if (pci_movable_bars_enabled()) {
+   __pci_bus_size_bridges(bus, NULL);
+   __pci_bus_assign_resources(bus, NULL, NULL);
+
+   goto dump;
+   }
+
/* Don't realloc if asked to do so */
enable_local = pci_realloc_detect(bus, pci_realloc_enable);
if (pci_realloc_enabled(enable_local)) {
-- 
2.21.0



[PATCH v5 07/23] PCI: hotplug: movable BARs: Don't allow added devices to steal resources

2019-08-16 Thread Sergey Miroshnichenko
When movable BARs are enabled, the PCI subsystem at first releases all the
bridge windows and then attempts to assign resources both to previously
working devices and to the newly hotplugged ones, with the same priority.

If a hotplugged device gets its BARs first, this may lead to lack of space
for already working devices, which is unacceptable. If that happens, mark
one of the new devices with the newly introduced flag PCI_DEV_DISABLED_BARS
(if it is not yet marked) and retry the BAR recalculation.

The worst case would be no BARs for hotplugged devices, while all the rest
just continue working.

The algorithm is simple and it doesn't retry different subsets of hot-added
devices in case of a failure, e.g. if there are no space to allocate BARs
for both hotplugged devices A and B, but is enough for just A, the A will
be marked with PCI_DEV_DISABLED_BARS first, then (after the next failure) -
B. As a result, A will not get BARs while it could. This issue is only
relevant when hotplugging two and more devices simultaneously.

Add a new res_mask bitmask to the struct pci_dev for storing the indices of
assigned BARs.

Signed-off-by: Sergey Miroshnichenko 
---
 drivers/pci/pci.h   |  11 +
 drivers/pci/probe.c | 101 ++--
 drivers/pci/setup-bus.c |  15 ++
 include/linux/pci.h |   1 +
 4 files changed, 125 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index a0ec696512eb..53249cbc21b6 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -373,6 +373,7 @@ static inline bool pci_dev_is_disconnected(const struct 
pci_dev *dev)
 
 /* pci_dev priv_flags */
 #define PCI_DEV_ADDED 0
+#define PCI_DEV_DISABLED_BARS 1
 
 static inline void pci_dev_assign_added(struct pci_dev *dev, bool added)
 {
@@ -384,6 +385,16 @@ static inline bool pci_dev_is_added(const struct pci_dev 
*dev)
return test_bit(PCI_DEV_ADDED, >priv_flags);
 }
 
+static inline void pci_dev_disable_bars(struct pci_dev *dev)
+{
+   assign_bit(PCI_DEV_DISABLED_BARS, >priv_flags, true);
+}
+
+static inline bool pci_dev_bars_enabled(const struct pci_dev *dev)
+{
+   return !test_bit(PCI_DEV_DISABLED_BARS, >priv_flags);
+}
+
 #ifdef CONFIG_PCIEAER
 #include 
 
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index a26bf740e9ab..bf0a7d1c5d09 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -3428,6 +3428,23 @@ void __weak pcibios_rescan_done(struct pci_dev *dev)
 {
 }
 
+static unsigned int pci_dev_count_res_mask(struct pci_dev *dev)
+{
+   unsigned int res_mask = 0;
+   int i;
+
+   for (i = 0; i < PCI_BRIDGE_RESOURCES; i++) {
+   struct resource *r = >resource[i];
+
+   if (!r->flags || (r->flags & IORESOURCE_UNSET) || !r->parent)
+   continue;
+
+   res_mask |= (1 << i);
+   }
+
+   return res_mask;
+}
+
 static void pci_bus_rescan_prepare(struct pci_bus *bus)
 {
struct pci_dev *dev;
@@ -3438,6 +3455,8 @@ static void pci_bus_rescan_prepare(struct pci_bus *bus)
list_for_each_entry(dev, >devices, bus_list) {
struct pci_bus *child = dev->subordinate;
 
+   dev->res_mask = pci_dev_count_res_mask(dev);
+
if (child)
pci_bus_rescan_prepare(child);
 
@@ -3481,7 +3500,7 @@ static void pci_setup_bridges(struct pci_bus *bus)
list_for_each_entry(dev, >devices, bus_list) {
struct pci_bus *child;
 
-   if (!pci_dev_is_added(dev) || pci_dev_is_ignored(dev))
+   if (!pci_dev_is_added(dev) || !pci_dev_bars_enabled(dev))
continue;
 
child = dev->subordinate;
@@ -3493,6 +3512,83 @@ static void pci_setup_bridges(struct pci_bus *bus)
pci_setup_bridge(bus);
 }
 
+static struct pci_dev *pci_find_next_new_device(struct pci_bus *bus)
+{
+   struct pci_dev *dev;
+
+   if (!bus)
+   return NULL;
+
+   list_for_each_entry(dev, >devices, bus_list) {
+   struct pci_bus *child_bus = dev->subordinate;
+
+   if (!pci_dev_is_added(dev) && pci_dev_bars_enabled(dev))
+   return dev;
+
+   if (child_bus) {
+   struct pci_dev *next_new_dev;
+
+   next_new_dev = pci_find_next_new_device(child_bus);
+   if (next_new_dev)
+   return next_new_dev;
+   }
+   }
+
+   return NULL;
+}
+
+static bool pci_bus_check_all_bars_reassigned(struct pci_bus *bus)
+{
+   struct pci_dev *dev;
+   bool ret = true;
+
+   if (!bus)
+   return false;
+
+   list_for_each_entry(dev, >devices, bus_list) {
+   struct pci_bus *child = dev->subordinate;
+   unsigned int res_mask = pci_dev_count_res_mask(dev);
+
+   if (!pci_dev_bars_enabled(dev))
+   continue;
+
+

[PATCH v5 08/23] PCI: Include fixed and immovable BARs into the bus size calculating

2019-08-16 Thread Sergey Miroshnichenko
The only difference between the fixed/immovable and movable BARs is a size
and offset preservation after they are released (the corresponding struct
resource* detached from a bridge window for a while during a bus rescan).

Include fixed/immovable BARs into result of pbus_size_mem() and prohibit
assigning them to non-direct parents.

Signed-off-by: Sergey Miroshnichenko 
---
 drivers/pci/setup-bus.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 1a731002ce18..2c250efca512 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1011,12 +1011,21 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned 
long mask,
struct resource *r = >resource[i];
resource_size_t r_size;
 
-   if (r->parent || (r->flags & IORESOURCE_PCI_FIXED) ||
+   if (r->parent ||
((r->flags & mask) != type &&
 (r->flags & mask) != type2 &&
 (r->flags & mask) != type3))
continue;
r_size = resource_size(r);
+
+   if ((r->flags & IORESOURCE_PCI_FIXED) ||
+   !pci_dev_movable_bars_supported(dev)) {
+   if (pci_movable_bars_enabled())
+   size += r_size;
+
+   continue;
+   }
+
 #ifdef CONFIG_PCI_IOV
/* Put SRIOV requested res to the optional list */
if (realloc_head && i >= PCI_IOV_RESOURCES &&
-- 
2.21.0



[PATCH v5 06/23] PCI: hotplug: movable BARs: Recalculate all bridge windows during rescan

2019-08-16 Thread Sergey Miroshnichenko
When the movable BARs feature is enabled and a rescan has been requested,
release all the bridge windows and recalculate them from scratch, taking
into account all kinds for BARs: fixed, immovable, movable, new.

This increases the chances to find a memory space to fit BARs for newly
hotplugged devices, especially if no/not enough gaps were reserved by the
BIOS/bootloader/firmware.

The last step of writing the recalculated windows to the bridges is done
by the new pci_setup_bridges() function.

Signed-off-by: Sergey Miroshnichenko 
---
 drivers/pci/pci.h   |  1 +
 drivers/pci/probe.c | 22 ++
 drivers/pci/setup-bus.c | 16 
 3 files changed, 39 insertions(+)

diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index be7acc477c64..a0ec696512eb 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -253,6 +253,7 @@ void __pci_bus_assign_resources(const struct pci_bus *bus,
struct list_head *realloc_head,
struct list_head *fail_head);
 bool pci_bus_clip_resource(struct pci_dev *dev, int idx);
+void pci_bus_release_root_bridge_resources(struct pci_bus *bus);
 
 void pci_reassigndev_resource_alignment(struct pci_dev *dev);
 void pci_disable_bridge_window(struct pci_dev *dev);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 60e3b48d2251..a26bf740e9ab 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -3474,6 +3474,25 @@ static void pci_bus_rescan_done(struct pci_bus *bus)
pci_config_pm_runtime_put(bus->self);
 }
 
+static void pci_setup_bridges(struct pci_bus *bus)
+{
+   struct pci_dev *dev;
+
+   list_for_each_entry(dev, >devices, bus_list) {
+   struct pci_bus *child;
+
+   if (!pci_dev_is_added(dev) || pci_dev_is_ignored(dev))
+   continue;
+
+   child = dev->subordinate;
+   if (child)
+   pci_setup_bridges(child);
+   }
+
+   if (bus->self)
+   pci_setup_bridge(bus);
+}
+
 /**
  * pci_rescan_bus - Scan a PCI bus for devices
  * @bus: PCI bus to scan
@@ -3495,8 +3514,11 @@ unsigned int pci_rescan_bus(struct pci_bus *bus)
pci_bus_rescan_prepare(root);
 
max = pci_scan_child_bus(root);
+
+   pci_bus_release_root_bridge_resources(root);
pci_assign_unassigned_root_bus_resources(root);
 
+   pci_setup_bridges(root);
pci_bus_rescan_done(root);
} else {
max = pci_scan_child_bus(bus);
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 7c2c57f77c6f..04f626e1ac18 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1635,6 +1635,22 @@ static void pci_bus_release_bridge_resources(struct 
pci_bus *bus,
pci_bridge_release_resources(bus, type);
 }
 
+void pci_bus_release_root_bridge_resources(struct pci_bus *root_bus)
+{
+   int i;
+   struct resource *r;
+
+   pci_bus_release_bridge_resources(root_bus, IORESOURCE_IO, 
whole_subtree);
+   pci_bus_release_bridge_resources(root_bus, IORESOURCE_MEM, 
whole_subtree);
+   pci_bus_release_bridge_resources(root_bus,
+IORESOURCE_MEM_64 | 
IORESOURCE_PREFETCH,
+whole_subtree);
+
+   pci_bus_for_each_resource(root_bus, r, i) {
+   pci_release_child_resources(root_bus, r);
+   }
+}
+
 static void pci_bus_dump_res(struct pci_bus *bus)
 {
struct resource *res;
-- 
2.21.0



[PATCH v5 05/23] PCI: hotplug: movable BARs: Fix reassigning the released bridge windows

2019-08-16 Thread Sergey Miroshnichenko
When a bridge window is temporarily released during the rescan, its old
size is not relevant anymore - it will be recreated from pbus_size_*(), so
it's start value should be zero.

If such window can't be reassigned, don't apply reset_resource(), so the
next retry may succeed.

Signed-off-by: Sergey Miroshnichenko 
---
 drivers/pci/setup-bus.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 6cb8b293c576..7c2c57f77c6f 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -295,7 +295,8 @@ static void assign_requested_resources_sorted(struct 
list_head *head,
0 /* don't care */,
0 /* don't care */);
}
-   reset_resource(res);
+   if (!pci_movable_bars_enabled())
+   reset_resource(res);
}
}
 }
@@ -1579,8 +1580,8 @@ static void pci_bridge_release_resources(struct pci_bus 
*bus,
type = old_flags = r->flags & PCI_RES_TYPE_MASK;
pci_info(dev, "resource %d %pR released\n",
 PCI_BRIDGE_RESOURCES + idx, r);
-   /* Keep the old size */
-   r->end = resource_size(r) - 1;
+   /* Don't keep the old size if the bridge will be recalculated */
+   r->end = pci_movable_bars_enabled() ? 0 : (resource_size(r) - 
1);
r->start = 0;
r->flags = 0;
 
-- 
2.21.0



[PATCH v5 04/23] PCI: Define PCI-specific version of the release_child_resources()

2019-08-16 Thread Sergey Miroshnichenko
If release the bridge resources with standard release_child_resources(), it
drops the .start field of children's BARs to zero, but with the STARTALIGN
flag remaining set, which makes the resource invalid for reassignment.

Some resources must preserve their offset and size: those marked with the
PCI_FIXED and the immovable ones - which are bound by drivers without
support of the movable BARs feature.

Add the pci_release_child_resources() to replace release_child_resources()
in handling the described PCI-specific cases.

Signed-off-by: Sergey Miroshnichenko 
---
 drivers/pci/setup-bus.c | 54 -
 1 file changed, 53 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 79b1fa6519be..6cb8b293c576 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -1482,6 +1482,55 @@ static void __pci_bridge_assign_resources(const struct 
pci_dev *bridge,
(IORESOURCE_IO | IORESOURCE_MEM | IORESOURCE_PREFETCH |\
 IORESOURCE_MEM_64)
 
+/*
+ * Similar to generic release_child_resources(), but aware of immovable BARs 
and
+ * PCI_FIXED and STARTALIGN flags
+ */
+static void pci_release_child_resources(struct pci_bus *bus, struct resource 
*r)
+{
+   struct pci_dev *dev;
+
+   if (!bus || !r)
+   return;
+
+   if (r->flags & IORESOURCE_PCI_FIXED)
+   return;
+
+   r->child = NULL;
+
+   list_for_each_entry(dev, >devices, bus_list) {
+   int i;
+
+   for (i = 0; i < PCI_NUM_RESOURCES; i++) {
+   struct resource *tmp = >resource[i];
+   resource_size_t size = resource_size(tmp);
+
+   if (!tmp->flags || tmp->parent != r)
+   continue;
+
+   tmp->parent = NULL;
+   tmp->sibling = NULL;
+
+   pci_release_child_resources(dev->subordinate, tmp);
+
+   if ((tmp->flags & IORESOURCE_PCI_FIXED) ||
+   !pci_dev_movable_bars_supported(dev)) {
+   pci_dbg(dev, "release immovable %pR (%s), keep 
its flags, base and size\n",
+   tmp, tmp->name);
+   continue;
+   }
+
+   pci_dbg(dev, "release %pR (%s)\n", tmp, tmp->name);
+
+   tmp->start = 0;
+   tmp->end = size - 1;
+
+   tmp->flags &= ~IORESOURCE_STARTALIGN;
+   tmp->flags |= IORESOURCE_SIZEALIGN;
+   }
+   }
+}
+
 static void pci_bridge_release_resources(struct pci_bus *bus,
 unsigned long type)
 {
@@ -1522,7 +1571,10 @@ static void pci_bridge_release_resources(struct pci_bus 
*bus,
return;
 
/* If there are children, release them all */
-   release_child_resources(r);
+   if (pci_movable_bars_enabled())
+   pci_release_child_resources(bus, r);
+   else
+   release_child_resources(r);
if (!release_resource(r)) {
type = old_flags = r->flags & PCI_RES_TYPE_MASK;
pci_info(dev, "resource %d %pR released\n",
-- 
2.21.0



[PATCH v5 03/23] PCI: hotplug: Add a flag for the movable BARs feature

2019-08-16 Thread Sergey Miroshnichenko
When hot-adding a device, the bridge may have windows not big enough (or
fragmented too much) for newly requested BARs to fit in. And expanding
these bridge windows may be impossible because blocked by "neighboring"
BARs and bridge windows.

Still, it may be possible to allocate a memory region for new BARs with the
following procedure:

1) notify all the drivers which support movable BARs to pause and release
   the BARs; the rest of the drivers are guaranteed that their devices will
   not get BARs moved;

2) release all the bridge windows except of root bridges;

3) try to recalculate new bridge windows that will fit all the BAR types:
   - fixed;
   - immovable;
   - movable;
   - newly requested by hot-added devices;

4) if the previous step fails, disable BARs for one of the hot-added
   devices and retry from step 3;

5) notify the drivers, so they remap BARs and resume.

This makes the prior reservation of memory by BIOS/bootloader/firmware not
required anymore for the PCI hotplug.

Drivers indicate their support of movable BARs by implementing the new
.rescan_prepare() and .rescan_done() hooks in the struct pci_driver. All
device's activity must be paused during a rescan, and iounmap()+ioremap()
must be applied to every used BAR.

The platform also may need to prepare to BAR movement, so new hooks added:
pcibios_rescan_prepare(pci_dev) and pcibios_rescan_prepare(pci_dev).

This patch is a preparation for future patches with actual implementation,
and for now it just does the following:
 - declares the feature;
 - defines pci_movable_bars_enabled(), pci_dev_movable_bars_supported(dev);
 - invokes the .rescan_prepare() and .rescan_done() driver notifiers;
 - declares and invokes the pcibios_rescan_prepare()/_done() hooks;
 - adds the PCI_IMMOVABLE_BARS flag.

The feature is disabled by default (via PCI_IMMOVABLE_BARS) until the final
patch of the series. It can be overridden per-arch using this flag or by
the following command line option:

pcie_movable_bars={ off | force }

CC: Sam Bobroff 
CC: Rajat Jain 
CC: Lukas Wunner 
CC: Oliver O'Halloran 
CC: David Laight 
Signed-off-by: Sergey Miroshnichenko 
---
 .../admin-guide/kernel-parameters.txt |  7 ++
 drivers/pci/pci-driver.c  |  2 +
 drivers/pci/pci.c | 24 ++
 drivers/pci/pci.h |  2 +
 drivers/pci/probe.c   | 86 ++-
 include/linux/pci.h   |  7 ++
 6 files changed, 126 insertions(+), 2 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt 
b/Documentation/admin-guide/kernel-parameters.txt
index 47d981a86e2f..e2274ee87a35 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3526,6 +3526,13 @@
nomsi   Do not use MSI for native PCIe PME signaling (this makes
all PCIe root ports use INTx for all services).
 
+   pcie_movable_bars=[PCIE]
+   Override the movable BARs support detection:
+   off
+   Disable even if supported by the platform
+   force
+   Enable even if not explicitly declared as supported
+
pcmv=   [HW,PCMCIA] BadgePAD 4
 
pd_ignore_unused
diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
index a8124e47bf6e..d11909e79263 100644
--- a/drivers/pci/pci-driver.c
+++ b/drivers/pci/pci-driver.c
@@ -1688,6 +1688,8 @@ static int __init pci_driver_init(void)
 {
int ret;
 
+   pci_add_flags(PCI_IMMOVABLE_BARS);
+
ret = bus_register(_bus_type);
if (ret)
return ret;
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 61d951766087..3a504f58ac60 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -139,6 +139,30 @@ static int __init pcie_port_pm_setup(char *str)
 }
 __setup("pcie_port_pm=", pcie_port_pm_setup);
 
+static bool pcie_movable_bars_off;
+static bool pcie_movable_bars_force;
+static int __init pcie_movable_bars_setup(char *str)
+{
+   if (!strcmp(str, "off"))
+   pcie_movable_bars_off = true;
+   else if (!strcmp(str, "force"))
+   pcie_movable_bars_force = true;
+   return 1;
+}
+__setup("pcie_movable_bars=", pcie_movable_bars_setup);
+
+bool pci_movable_bars_enabled(void)
+{
+   if (pcie_movable_bars_off)
+   return false;
+
+   if (pcie_movable_bars_force)
+   return true;
+
+   return !pci_has_flag(PCI_IMMOVABLE_BARS);
+}
+EXPORT_SYMBOL(pci_movable_bars_enabled);
+
 /* Time to wait after a reset for device to become responsive */
 #define PCIE_RESET_READY_POLL_MS 6
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index d22d1b807701..be7acc477c64 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -257,6 +257,8 @@ bool pci_bus_clip_resource(struct pci_dev *dev, int idx);
 void 

[PATCH v5 01/23] PCI: Fix race condition in pci_enable/disable_device()

2019-08-16 Thread Sergey Miroshnichenko
This is a yet another approach to fix an old [1-2] concurrency issue, when:
 - two or more devices are being hot-added into a bridge which was
   initially empty;
 - a bridge with two or more devices is being hot-added;
 - during boot, if BIOS/bootloader/firmware doesn't pre-enable bridges.

The problem is that a bridge is reported as enabled before the MEM/IO bits
are actually written to the PCI_COMMAND register, so another driver thread
starts memory requests through the not-yet-enabled bridge:

 CPU0CPU1

 pci_enable_device_mem() pci_enable_device_mem()
   pci_enable_bridge() pci_enable_bridge()
 pci_is_enabled()
   return false;
 atomic_inc_return(enable_cnt)
 Start actual enabling the bridge
 ... pci_is_enabled()
 ...   return true;
 ... Start memory requests <-- FAIL
 ...
 Set the PCI_COMMAND_MEMORY bit <-- Must wait for this

Protect the pci_enable/disable_device() and pci_enable_bridge(), which is
similar to the previous solution from commit 40f11adc7cd9 ("PCI: Avoid race
while enabling upstream bridges"), but adding a per-device mutexes and
preventing the dev->enable_cnt from from incrementing early.

CC: Srinath Mannam 
CC: Marta Rybczynska 
Signed-off-by: Sergey Miroshnichenko 

[1] 
https://lore.kernel.org/linux-pci/1501858648-8-1-git-send-email-srinath.man...@broadcom.com/T/#u
[RFC PATCH v3] pci: Concurrency issue during pci enable bridge

[2] 
https://lore.kernel.org/linux-pci/744877924.5841545.1521630049567.javamail.zim...@kalray.eu/T/#u
[RFC PATCH] nvme: avoid race-conditions when enabling devices
---
 drivers/pci/pci.c   | 26 ++
 drivers/pci/probe.c |  1 +
 include/linux/pci.h |  1 +
 3 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 1b27b5af3d55..e7f8c354e644 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1645,6 +1645,8 @@ static void pci_enable_bridge(struct pci_dev *dev)
struct pci_dev *bridge;
int retval;
 
+   mutex_lock(>enable_mutex);
+
bridge = pci_upstream_bridge(dev);
if (bridge)
pci_enable_bridge(bridge);
@@ -1652,6 +1654,7 @@ static void pci_enable_bridge(struct pci_dev *dev)
if (pci_is_enabled(dev)) {
if (!dev->is_busmaster)
pci_set_master(dev);
+   mutex_unlock(>enable_mutex);
return;
}
 
@@ -1660,11 +1663,14 @@ static void pci_enable_bridge(struct pci_dev *dev)
pci_err(dev, "Error enabling bridge (%d), continuing\n",
retval);
pci_set_master(dev);
+   mutex_unlock(>enable_mutex);
 }
 
 static int pci_enable_device_flags(struct pci_dev *dev, unsigned long flags)
 {
struct pci_dev *bridge;
+   /* Enable-locking of bridges is performed within the 
pci_enable_bridge() */
+   bool need_lock = !dev->subordinate;
int err;
int i, bars = 0;
 
@@ -1680,8 +1686,13 @@ static int pci_enable_device_flags(struct pci_dev *dev, 
unsigned long flags)
dev->current_state = (pmcsr & PCI_PM_CTRL_STATE_MASK);
}
 
-   if (atomic_inc_return(>enable_cnt) > 1)
+   if (need_lock)
+   mutex_lock(>enable_mutex);
+   if (pci_is_enabled(dev)) {
+   if (need_lock)
+   mutex_unlock(>enable_mutex);
return 0;   /* already enabled */
+   }
 
bridge = pci_upstream_bridge(dev);
if (bridge)
@@ -1696,8 +1707,10 @@ static int pci_enable_device_flags(struct pci_dev *dev, 
unsigned long flags)
bars |= (1 << i);
 
err = do_pci_enable_device(dev, bars);
-   if (err < 0)
-   atomic_dec(>enable_cnt);
+   if (err >= 0)
+   atomic_inc(>enable_cnt);
+   if (need_lock)
+   mutex_unlock(>enable_mutex);
return err;
 }
 
@@ -1941,15 +1954,20 @@ void pci_disable_device(struct pci_dev *dev)
if (dr)
dr->enabled = 0;
 
+   mutex_lock(>enable_mutex);
dev_WARN_ONCE(>dev, atomic_read(>enable_cnt) <= 0,
  "disabling already-disabled device");
 
-   if (atomic_dec_return(>enable_cnt) != 0)
+   if (atomic_dec_return(>enable_cnt) != 0) {
+   mutex_unlock(>enable_mutex);
return;
+   }
 
do_pci_disable_device(dev);
 
dev->is_busmaster = 0;
+
+   mutex_unlock(>enable_mutex);
 }
 EXPORT_SYMBOL(pci_disable_device);
 
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index a3c7338fad86..2e58ece820e8 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2427,6 +2427,7 @@ struct pci_dev *pci_alloc_dev(struct pci_bus *bus)
INIT_LIST_HEAD(>bus_list);
 

[PATCH v5 02/23] PCI: Enable bridge's I/O and MEM access for hotplugged devices

2019-08-16 Thread Sergey Miroshnichenko
The PCI_COMMAND_IO and PCI_COMMAND_MEMORY bits of the bridge must be
updated not only when enabling the bridge for the first time, but also if a
hotplugged device requests these types of resources.

Originally these bits were set by the pci_enable_device_flags() only, which
exits early if the bridge is already pci_is_enabled(). So if the bridge was
empty initially (an edge case), then hotplugged devices fail to IO/MEM.

Signed-off-by: Sergey Miroshnichenko 
---
 drivers/pci/pci.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index e7f8c354e644..61d951766087 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1652,6 +1652,14 @@ static void pci_enable_bridge(struct pci_dev *dev)
pci_enable_bridge(bridge);
 
if (pci_is_enabled(dev)) {
+   int i, bars = 0;
+
+   for (i = PCI_BRIDGE_RESOURCES; i < DEVICE_COUNT_RESOURCE; i++) {
+   if (dev->resource[i].flags & (IORESOURCE_MEM | 
IORESOURCE_IO))
+   bars |= (1 << i);
+   }
+   do_pci_enable_device(dev, bars);
+
if (!dev->is_busmaster)
pci_set_master(dev);
mutex_unlock(>enable_mutex);
-- 
2.21.0



[PATCH v5 00/23] PCI: Allow BAR movement during hotplug

2019-08-16 Thread Sergey Miroshnichenko
If the firmware or kernel has arranged memory for PCIe devices in a way
that doesn't provide enough space for BARs of a new hotplugged device, the
kernel can pause the drivers of the "obstructing" devices and move their
BARs, so the new BARs can fit into the freed spaces.

To rearrange the BARs and bridge windows these patches releases all of them
after a rescan and re-assigns in the same way as during the initial PCIe
topology scan at system boot.

When a driver is un-paused by the kernel after the PCIe rescan, it should
check if its BARs had moved, and ioremap() them.

Drivers indicate their support of the feature by implementing the new hooks
.rescan_prepare() and .rescan_done() in the struct pci_driver. If a driver
doesn't yet support the feature, BARs of its devices will be considered as
immovable (by checking the pci_dev_movable_bars_supported(dev)) and handled
in the same way as resources with the IORESOURCE_PCI_FIXED flag.

If a driver doesn't yet support the feature, its devices are guaranteed to
have their BARs remaining untouched.

Tested on:
 - x86_64 with "pci=realloc,assign-busses,use_crs,pcie_bus_peer2peer";
 - POWER8 PowerNV+OPAL+PHB3 ppc64le with [1] applied and the following:
   "pci=realloc,pcie_bus_peer2peer";
 - both platforms [with extra pacthes (yet to be submitted) for movable bus
   numbers]: manually initiated (via sysfs) rescan has found and turned on
   a hotplugged bridge.

Not so many platforms and test cases were covered, so all who are
interested are highly welcome to test on your setups - the more exotic the
better!

This patchset is a part of our work on adding support for hotplugging
bridges full of other bridges, NVME drives, SAS HBAs and GPUs without
special requirements such as Hot-Plug Controller, reservation of bus
numbers or memory regions by firmware, etc. The next patchset to submit
will implement the movable bus numbers.

[1] https://lists.ozlabs.org/pipermail/linuxppc-dev/2019-August/195272.html
[PATCH v6 0/5] powerpc/powernv/pci: Make hotplug self-sufficient, 
independent of FW and DT

Changes since v4:
 - Feature is enabled by default (turned on by one of the latest patches);
 - Add pci_dev_movable_bars_supported(dev) instead of marking the immovable
   BARs with the IORESOURCE_PCI_FIXED flag;
 - Set up PCIe bridges during rescan via sysfs, so MPS settings are now
   configured not only during system boot or pcihp events;
 - Allow movement of switch's BARs if claimed by portdrv;
 - Update EEH address caches after rescan for powerpc;
 - Don't disable completely hot-added devices which can't have BARs being
   fit - just disable their BARs, so they are still visible in lspci etc;
 - Clearer names: fixed_range_hard -> immovable_range, fixed_range_soft ->
   realloc_range;
 - Drop the patch for pci_restore_config_space() - fixed by properly using
   the runtime PM.

Changes since v3:
 - Rebased to the upstream, so the patches apply cleanly again.

Changes since v2:
 - Fixed double-assignment of bridge windows;
 - Fixed assignment of fixed prefetched resources;
 - Fixed releasing of fixed resources;
 - Fixed a debug message;
 - Removed auto-enabling the movable BARs for x86 - let's rely on the
   "pcie_movable_bars=force" option for now;
 - Reordered the patches - bugfixes first.

Changes since v1:
 - Add a "pcie_movable_bars={ off | force }" command line argument;
 - Handle the IORESOURCE_PCI_FIXED flag properly;
 - Don't move BARs of devices which don't support the feature;
 - Guarantee that new hotplugged devices will not steal memory from working
   devices by ignoring the failing new devices with the new PCI_DEV_IGNORE
   flag;
 - Add rescan_prepare()+rescan_done() to the struct pci_driver instead of
   using the reset_prepare()+reset_done() from struct pci_error_handlers;
 - Add a bugfix of a race condition;
 - Fixed hotplug in a non-pre-enabled (by BIOS/firmware) bridge;
 - Fix the compatibility of the feature with pm_runtime and D3-state;
 - Hotplug events from pciehp also can move BARs;
 - Add support of the feature to the NVME driver.

Sergey Miroshnichenko (23):
  PCI: Fix race condition in pci_enable/disable_device()
  PCI: Enable bridge's I/O and MEM access for hotplugged devices
  PCI: hotplug: Add a flag for the movable BARs feature
  PCI: Define PCI-specific version of the release_child_resources()
  PCI: hotplug: movable BARs: Fix reassigning the released bridge
windows
  PCI: hotplug: movable BARs: Recalculate all bridge windows during
rescan
  PCI: hotplug: movable BARs: Don't allow added devices to steal
resources
  PCI: Include fixed and immovable BARs into the bus size calculating
  PCI: Prohibit assigning BARs and bridge windows to non-direct parents
  PCI: hotplug: movable BARs: Try to assign unassigned resources only
once
  PCI: hotplug: movable BARs: Calculate immovable parts of bridge
windows
  PCI: hotplug: movable BARs: Compute limits for relocated bridge
windows
  PCI: Make sure bridge windows include their 

Re: [PATCH] powerpc/vdso32: Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE

2019-08-16 Thread Segher Boessenkool
On Fri, Aug 16, 2019 at 01:01:50PM +, Christophe Leroy wrote:
> - add r3,r3,r5
> +78:  add r3,r3,r5

You can use actual names for the labels as well...  .Lsomething if you
want it to stay a local symbol only.


Segher


[PATCH v6 5/5] powerpc/pci: Enable assigning bus numbers instead of reading them from DT

2019-08-16 Thread Sergey Miroshnichenko
If the firmware indicates support of reassigning bus numbers via the PHB's
"ibm,supported-movable-bdfs" property in DT, PowerNV will not depend on PCI
topology info from DT anymore.

This makes possible to re-enumerate the fabric, assign the new bus numbers
and switch from the pnv_php module to the standard pciehp driver for PCI
hotplug functionality.

Signed-off-by: Sergey Miroshnichenko 
---
 arch/powerpc/kernel/pci_dn.c | 5 +
 arch/powerpc/platforms/powernv/eeh-powernv.c | 2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index 261d61460eac..90f8d46550df 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -542,6 +542,11 @@ void pci_devs_phb_init_dynamic(struct pci_controller *phb)
phb->pci_data = pdn;
}
 
+   if (of_get_property(dn, "ibm,supported-movable-bdfs", NULL)) {
+   pci_add_flags(PCI_REASSIGN_ALL_BUS);
+   return;
+   }
+
/* Update dn->phb ptrs for new phb and children devices */
pci_traverse_device_nodes(dn, add_pdn, phb);
 }
diff --git a/arch/powerpc/platforms/powernv/eeh-powernv.c 
b/arch/powerpc/platforms/powernv/eeh-powernv.c
index 620a986209f5..eb01f16c4e60 100644
--- a/arch/powerpc/platforms/powernv/eeh-powernv.c
+++ b/arch/powerpc/platforms/powernv/eeh-powernv.c
@@ -41,7 +41,7 @@ void pnv_pcibios_bus_add_device(struct pci_dev *pdev)
 {
struct pci_dn *pdn = pci_get_pdn(pdev);
 
-   if (!pdev->is_virtfn)
+   if (!pdev->is_virtfn && !pci_has_flag(PCI_REASSIGN_ALL_BUS))
return;
 
/*
-- 
2.21.0



[PATCH v6 4/5] powerpc/powernv/pci: Hook up the writes to PCI_SECONDARY_BUS register

2019-08-16 Thread Sergey Miroshnichenko
Writing a new value to the PCI_SECONDARY_BUS register of the bridge means
that its children will become addressable on another address (new B in BDF)
or even un-addressable if the secondary bus is set to zero.

On PowerNV, device PEs are heavily BDF-dependent, so they must be updated
on every such change of its address.

Signed-off-by: Sergey Miroshnichenko 
---
 arch/powerpc/platforms/powernv/pci.c | 118 ++-
 1 file changed, 116 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index a5b04410c8b4..e9b4ed0f97a3 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -717,13 +717,127 @@ int pnv_pci_cfg_read(struct pci_dn *pdn,
where, size, val);
 }
 
+static void invalidate_children_pes(struct pci_dn *pdn)
+{
+   struct pnv_phb *phb = pdn->phb->private_data;
+   struct pci_dn *child;
+   bool found_pe = false;
+   int pe_num;
+   int pe_bus;
+
+   list_for_each_entry(child, >child_list, list) {
+   struct pnv_ioda_pe *pe = (child->pe_number != IODA_INVALID_PE) ?
+   >ioda.pe_array[child->pe_number] :
+   NULL;
+
+   if (!child->busno)
+   continue;
+
+   if ((child->class_code >> 8) == PCI_CLASS_BRIDGE_PCI)
+   invalidate_children_pes(child);
+
+   if (pe) {
+   u8 rid_bus = (pe->rid >> 8) & 0xff;
+
+   if (rid_bus) {
+   pe_num = child->pe_number;
+   pe_bus = rid_bus;
+   found_pe = true;
+   }
+
+   pe->rid &= 0xff;
+   }
+
+   child->busno = 0;
+   }
+
+   if (found_pe) {
+   u16 rid = pe_bus << 8;
+
+   opal_pci_set_pe(phb->opal_id, pe_num, rid, 7, 0, 0, 
OPAL_UNMAP_PE);
+   }
+}
+
+static u8 pre_hook_new_sec_bus(struct pci_dn *pdn, u8 new_secondary_bus)
+{
+   u32 old_secondary_bus = 0;
+
+   if ((pdn->class_code >> 8) != PCI_CLASS_BRIDGE_PCI)
+   return 0;
+
+   pnv_pci_cfg_read(pdn, PCI_SECONDARY_BUS, 1, _secondary_bus);
+   old_secondary_bus &= 0xff;
+
+   if (old_secondary_bus != new_secondary_bus)
+   invalidate_children_pes(pdn);
+
+   return old_secondary_bus;
+}
+
+static void update_children_pes(struct pci_dn *pdn, u8 new_secondary_bus)
+{
+   struct pnv_phb *phb = pdn->phb->private_data;
+   struct pci_dn *child;
+   bool found_pe = false;
+   int pe_num;
+
+   if (!new_secondary_bus)
+   return;
+
+   list_for_each_entry(child, >child_list, list) {
+   struct pnv_ioda_pe *pe = (child->pe_number != IODA_INVALID_PE) ?
+   >ioda.pe_array[child->pe_number] :
+   NULL;
+
+   if (child->busno)
+   continue;
+
+   child->busno = new_secondary_bus;
+
+   if (pe) {
+   pe->rid |= (child->busno << 8);
+   pe_num = child->pe_number;
+   found_pe = true;
+   }
+   }
+
+   if (found_pe) {
+   u16 rid = new_secondary_bus << 8;
+
+   opal_pci_set_pe(phb->opal_id, pe_num, rid, 7, 0, 0, 
OPAL_MAP_PE);
+   }
+}
+
+static void post_hook_new_sec_bus(struct pci_dn *pdn, u8 new_secondary_bus)
+{
+   if ((pdn->class_code >> 8) != PCI_CLASS_BRIDGE_PCI)
+   return;
+
+   update_children_pes(pdn, new_secondary_bus);
+}
+
 int pnv_pci_cfg_write(struct pci_dn *pdn,
  int where, int size, u32 val)
 {
struct pnv_phb *phb = pdn->phb->private_data;
+   u8 old_secondary_bus = 0, new_secondary_bus = 0;
+   int rc;
+
+   if (where == PCI_SECONDARY_BUS) {
+   new_secondary_bus = val & 0xff;
+   old_secondary_bus = pre_hook_new_sec_bus(pdn, 
new_secondary_bus);
+   } else if (where == PCI_PRIMARY_BUS && size > 1) {
+   new_secondary_bus = (val >> 8) & 0xff;
+   old_secondary_bus = pre_hook_new_sec_bus(pdn, 
new_secondary_bus);
+   }
 
-   return pnv_pci_cfg_write_raw(phb->opal_id, pdn->busno, pdn->devfn,
-where, size, val);
+   rc = pnv_pci_cfg_write_raw(phb->opal_id, pdn->busno, pdn->devfn,
+  where, size, val);
+
+   if (new_secondary_bus && old_secondary_bus != new_secondary_bus)
+   post_hook_new_sec_bus(pdn, new_secondary_bus);
+
+   return rc;
 }
 
 #if CONFIG_EEH
-- 
2.21.0



[PATCH v6 3/5] powerpc/pci: Create pci_dn on demand

2019-08-16 Thread Sergey Miroshnichenko
If a struct pci_dn hasn't yet been created for the PCIe device (there was
no DT node for it), allocate this structure and fill with info read from
the device directly.

Signed-off-by: Sergey Miroshnichenko 
---
 arch/powerpc/kernel/pci_dn.c | 88 ++--
 1 file changed, 74 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index e1a0ab2caafe..261d61460eac 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -20,6 +20,9 @@
 #include 
 #include 
 
+static struct pci_dn *pci_create_pdn_from_dev(struct pci_dev *pdev,
+ struct pci_dn *parent);
+
 /*
  * The function is used to find the firmware data of one
  * specific PCI device, which is attached to the indicated
@@ -52,6 +55,9 @@ struct pci_dn *pci_bus_to_pdn(struct pci_bus *bus)
dn = pci_bus_to_OF_node(pbus);
pdn = dn ? PCI_DN(dn) : NULL;
 
+   if (!pdn && pbus->self)
+   pdn = pbus->self->dev.archdata.pci_data;
+
return pdn;
 }
 
@@ -61,10 +67,13 @@ struct pci_dn *pci_get_pdn_by_devfn(struct pci_bus *bus,
struct device_node *dn = NULL;
struct pci_dn *parent, *pdn;
struct pci_dev *pdev = NULL;
+   bool pdev_found = false;
 
/* Fast path: fetch from PCI device */
list_for_each_entry(pdev, >devices, bus_list) {
if (pdev->devfn == devfn) {
+   pdev_found = true;
+
if (pdev->dev.archdata.pci_data)
return pdev->dev.archdata.pci_data;
 
@@ -73,6 +82,9 @@ struct pci_dn *pci_get_pdn_by_devfn(struct pci_bus *bus,
}
}
 
+   if (!pdev_found)
+   pdev = NULL;
+
/* Fast path: fetch from device node */
pdn = dn ? PCI_DN(dn) : NULL;
if (pdn)
@@ -85,9 +97,12 @@ struct pci_dn *pci_get_pdn_by_devfn(struct pci_bus *bus,
 
list_for_each_entry(pdn, >child_list, list) {
if (pdn->busno == bus->number &&
-pdn->devfn == devfn)
-return pdn;
-}
+   pdn->devfn == devfn) {
+   if (pdev)
+   pdev->dev.archdata.pci_data = pdn;
+   return pdn;
+   }
+   }
 
return NULL;
 }
@@ -117,17 +132,17 @@ struct pci_dn *pci_get_pdn(struct pci_dev *pdev)
 
list_for_each_entry(pdn, >child_list, list) {
if (pdn->busno == pdev->bus->number &&
-   pdn->devfn == pdev->devfn)
+   pdn->devfn == pdev->devfn) {
+   pdev->dev.archdata.pci_data = pdn;
return pdn;
+   }
}
 
-   return NULL;
+   return pci_create_pdn_from_dev(pdev, parent);
 }
 
-#ifdef CONFIG_PCI_IOV
-static struct pci_dn *add_one_dev_pci_data(struct pci_dn *parent,
-  int vf_index,
-  int busno, int devfn)
+static struct pci_dn *pci_alloc_pdn(struct pci_dn *parent,
+   int busno, int devfn)
 {
struct pci_dn *pdn;
 
@@ -143,7 +158,6 @@ static struct pci_dn *add_one_dev_pci_data(struct pci_dn 
*parent,
pdn->parent = parent;
pdn->busno = busno;
pdn->devfn = devfn;
-   pdn->vf_index = vf_index;
pdn->pe_number = IODA_INVALID_PE;
INIT_LIST_HEAD(>child_list);
INIT_LIST_HEAD(>list);
@@ -151,7 +165,51 @@ static struct pci_dn *add_one_dev_pci_data(struct pci_dn 
*parent,
 
return pdn;
 }
-#endif
+
+static struct pci_dn *pci_create_pdn_from_dev(struct pci_dev *pdev,
+ struct pci_dn *parent)
+{
+   struct pci_dn *pdn = NULL;
+   u32 class_code;
+   u16 device_id;
+   u16 vendor_id;
+
+   if (!parent)
+   return NULL;
+
+   pdn = pci_alloc_pdn(parent, pdev->bus->busn_res.start, pdev->devfn);
+   pci_info(pdev, "Create a new pdn for devfn %2x\n", pdev->devfn / 8);
+
+   if (!pdn) {
+   pci_err(pdev, "%s: Failed to allocate pdn\n", __func__);
+   return NULL;
+   }
+
+   #ifdef CONFIG_EEH
+   if (!eeh_dev_init(pdn)) {
+   kfree(pdn);
+   pci_err(pdev, "%s: Failed to allocate edev\n", __func__);
+   return NULL;
+   }
+   #endif /* CONFIG_EEH */
+
+   pci_bus_read_config_word(pdev->bus, pdev->devfn,
+PCI_VENDOR_ID, _id);
+   pdn->vendor_id = vendor_id;
+
+   pci_bus_read_config_word(pdev->bus, pdev->devfn,
+PCI_DEVICE_ID, _id);
+   pdn->device_id = device_id;
+
+   pci_bus_read_config_dword(pdev->bus, pdev->devfn,
+ PCI_CLASS_REVISION, _code);
+   class_code >>= 8;
+   pdn->class_code = class_code;
+
+   

[PATCH v6 2/5] powerpc/powernv/pci: Suppress an EEH error when reading an empty slot

2019-08-16 Thread Sergey Miroshnichenko
Reading an empty slot returns all ones, which triggers a false
EEH error event on PowerNV. This patch unfreezes the bus where
it has happened.

Reviewed-by: Oliver O'Halloran 
Signed-off-by: Sergey Miroshnichenko 
---
 arch/powerpc/include/asm/ppc-pci.h   |  1 +
 arch/powerpc/kernel/pci_dn.c |  2 +-
 arch/powerpc/platforms/powernv/pci.c | 31 +---
 3 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc-pci.h 
b/arch/powerpc/include/asm/ppc-pci.h
index cec2d6409515..8b51c8577b94 100644
--- a/arch/powerpc/include/asm/ppc-pci.h
+++ b/arch/powerpc/include/asm/ppc-pci.h
@@ -36,6 +36,7 @@ void *traverse_pci_dn(struct pci_dn *root,
  void *(*fn)(struct pci_dn *, void *),
  void *data);
 extern void pci_devs_phb_init_dynamic(struct pci_controller *phb);
+struct pci_dn *pci_bus_to_pdn(struct pci_bus *bus);
 
 /* From rtas_pci.h */
 extern void init_pci_config_tokens (void);
diff --git a/arch/powerpc/kernel/pci_dn.c b/arch/powerpc/kernel/pci_dn.c
index c4c8c237a106..e1a0ab2caafe 100644
--- a/arch/powerpc/kernel/pci_dn.c
+++ b/arch/powerpc/kernel/pci_dn.c
@@ -27,7 +27,7 @@
  * one of PF's bridge. For other devices, their firmware
  * data is linked to that of their bridge.
  */
-static struct pci_dn *pci_bus_to_pdn(struct pci_bus *bus)
+struct pci_dn *pci_bus_to_pdn(struct pci_bus *bus)
 {
struct pci_bus *pbus;
struct device_node *dn;
diff --git a/arch/powerpc/platforms/powernv/pci.c 
b/arch/powerpc/platforms/powernv/pci.c
index 8d6c094f074e..a5b04410c8b4 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -756,6 +756,21 @@ static inline pnv_pci_cfg_check(struct pci_dn *pdn)
 }
 #endif /* CONFIG_EEH */
 
+static int get_bus_pe_number(struct pci_bus *bus)
+{
+   struct pci_dn *pdn = pci_bus_to_pdn(bus);
+   struct pci_dn *child;
+
+   if (!pdn)
+   return IODA_INVALID_PE;
+
+   list_for_each_entry(child, >child_list, list)
+   if (child->pe_number != IODA_INVALID_PE)
+   return child->pe_number;
+
+   return IODA_INVALID_PE;
+}
+
 static int pnv_pci_read_config(struct pci_bus *bus,
   unsigned int devfn,
   int where, int size, u32 *val)
@@ -767,9 +782,19 @@ static int pnv_pci_read_config(struct pci_bus *bus,
 
*val = 0x;
pdn = pci_get_pdn_by_devfn(bus, devfn);
-   if (!pdn)
-   return pnv_pci_cfg_read_raw(phb->opal_id, bus->number, devfn,
-   where, size, val);
+   if (!pdn) {
+   int pe_number = get_bus_pe_number(bus);
+
+   ret = pnv_pci_cfg_read_raw(phb->opal_id, bus->number, devfn,
+  where, size, val);
+
+   if (!ret && (*val == EEH_IO_ERROR_VALUE(size)) && 
phb->unfreeze_pe)
+   phb->unfreeze_pe(phb, (pe_number == IODA_INVALID_PE) ?
+phb->ioda.reserved_pe_idx : pe_number,
+OPAL_EEH_ACTION_CLEAR_FREEZE_ALL);
+
+   return ret;
+   }
 
if (!pnv_pci_cfg_check(pdn))
return PCIBIOS_DEVICE_NOT_FOUND;
-- 
2.21.0



[PATCH v6 1/5] powerpc/pci: Access PCI config space directly w/o pci_dn

2019-08-16 Thread Sergey Miroshnichenko
To fetch an updated DT for the newly hotplugged device, OS must explicitly
request it from the firmware via the pnv_php driver.

If pnv_php wasn't triggered/loaded, it is still possible to discover new
devices if PCIe I/O will not stop in absence of the pci_dn structure.

Reviewed-by: Oliver O'Halloran 
Signed-off-by: Sergey Miroshnichenko 
---
 arch/powerpc/kernel/rtas_pci.c   | 97 +++-
 arch/powerpc/platforms/powernv/pci.c | 64 --
 2 files changed, 109 insertions(+), 52 deletions(-)

diff --git a/arch/powerpc/kernel/rtas_pci.c b/arch/powerpc/kernel/rtas_pci.c
index ae5e43eaca48..912da28b3737 100644
--- a/arch/powerpc/kernel/rtas_pci.c
+++ b/arch/powerpc/kernel/rtas_pci.c
@@ -42,10 +42,26 @@ static inline int config_access_valid(struct pci_dn *dn, 
int where)
return 0;
 }
 
-int rtas_read_config(struct pci_dn *pdn, int where, int size, u32 *val)
+static int rtas_read_raw_config(unsigned long buid, int busno, unsigned int 
devfn,
+   int where, int size, u32 *val)
 {
int returnval = -1;
-   unsigned long buid, addr;
+   unsigned long addr = rtas_config_addr(busno, devfn, where);
+   int ret;
+
+   if (buid) {
+   ret = rtas_call(ibm_read_pci_config, 4, 2, ,
+   addr, BUID_HI(buid), BUID_LO(buid), size);
+   } else {
+   ret = rtas_call(read_pci_config, 2, 2, , addr, size);
+   }
+   *val = returnval;
+
+   return ret;
+}
+
+int rtas_read_config(struct pci_dn *pdn, int where, int size, u32 *val)
+{
int ret;
 
if (!pdn)
@@ -58,16 +74,8 @@ int rtas_read_config(struct pci_dn *pdn, int where, int 
size, u32 *val)
return PCIBIOS_SET_FAILED;
 #endif
 
-   addr = rtas_config_addr(pdn->busno, pdn->devfn, where);
-   buid = pdn->phb->buid;
-   if (buid) {
-   ret = rtas_call(ibm_read_pci_config, 4, 2, ,
-   addr, BUID_HI(buid), BUID_LO(buid), size);
-   } else {
-   ret = rtas_call(read_pci_config, 2, 2, , addr, size);
-   }
-   *val = returnval;
-
+   ret = rtas_read_raw_config(pdn->phb->buid, pdn->busno, pdn->devfn,
+  where, size, val);
if (ret)
return PCIBIOS_DEVICE_NOT_FOUND;
 
@@ -85,18 +93,44 @@ static int rtas_pci_read_config(struct pci_bus *bus,
 
pdn = pci_get_pdn_by_devfn(bus, devfn);
 
-   /* Validity of pdn is checked in here */
-   ret = rtas_read_config(pdn, where, size, val);
-   if (*val == EEH_IO_ERROR_VALUE(size) &&
-   eeh_dev_check_failure(pdn_to_eeh_dev(pdn)))
-   return PCIBIOS_DEVICE_NOT_FOUND;
+   if (pdn) {
+   /* Validity of pdn is checked in here */
+   ret = rtas_read_config(pdn, where, size, val);
+
+   if (*val == EEH_IO_ERROR_VALUE(size) &&
+   eeh_dev_check_failure(pdn_to_eeh_dev(pdn)))
+   ret = PCIBIOS_DEVICE_NOT_FOUND;
+   } else {
+   struct pci_controller *phb = pci_bus_to_host(bus);
+
+   ret = rtas_read_raw_config(phb->buid, bus->number, devfn,
+  where, size, val);
+   }
 
return ret;
 }
 
+static int rtas_write_raw_config(unsigned long buid, int busno, unsigned int 
devfn,
+int where, int size, u32 val)
+{
+   unsigned long addr = rtas_config_addr(busno, devfn, where);
+   int ret;
+
+   if (buid) {
+   ret = rtas_call(ibm_write_pci_config, 5, 1, NULL, addr,
+   BUID_HI(buid), BUID_LO(buid), size, (ulong)val);
+   } else {
+   ret = rtas_call(write_pci_config, 3, 1, NULL, addr, size, 
(ulong)val);
+   }
+
+   if (ret)
+   return PCIBIOS_DEVICE_NOT_FOUND;
+
+   return PCIBIOS_SUCCESSFUL;
+}
+
 int rtas_write_config(struct pci_dn *pdn, int where, int size, u32 val)
 {
-   unsigned long buid, addr;
int ret;
 
if (!pdn)
@@ -109,15 +143,8 @@ int rtas_write_config(struct pci_dn *pdn, int where, int 
size, u32 val)
return PCIBIOS_SET_FAILED;
 #endif
 
-   addr = rtas_config_addr(pdn->busno, pdn->devfn, where);
-   buid = pdn->phb->buid;
-   if (buid) {
-   ret = rtas_call(ibm_write_pci_config, 5, 1, NULL, addr,
-   BUID_HI(buid), BUID_LO(buid), size, (ulong) val);
-   } else {
-   ret = rtas_call(write_pci_config, 3, 1, NULL, addr, size, 
(ulong)val);
-   }
-
+   ret = rtas_write_raw_config(pdn->phb->buid, pdn->busno, pdn->devfn,
+   where, size, val);
if (ret)
return PCIBIOS_DEVICE_NOT_FOUND;
 
@@ -128,12 +155,20 @@ static int rtas_pci_write_config(struct pci_bus *bus,
 unsigned int devfn,
 int where, int size, 

[PATCH v6 0/5] powerpc/powernv/pci: Make hotplug self-sufficient, independent of FW and DT

2019-08-16 Thread Sergey Miroshnichenko
Allow switching from the pnv_php module to the standard pciehp driver for
PowerNV, if the platform supports it: it can be a server working on top of
the skiboot with the [1] patchset applied.

Add the ability to discover hot-added devices which weren't added to the
Device Tree (by the pnv_php via an explicit OPAL call when a hotplug event
was intercepted) by direct access to the bus.

Sync the changes in PCIe topology (bus numbers and PEs) with the skiboot.

Tested on POWER8 PowerNV+PHB3 ppc64le (our Vesnin server) with:
 - the pciehp driver active;
 - the pnv_php driver disabled;
 - the "pci=pcie_bus_peer2peer,realloc" kernel command line argument;
 - controlled hotplug of a network card with SR-IOV works;
 - activating of SR-IOV on a network card works;
 - [with extra patches for movable BARs and bus numbers] manually initiated
   (via sysfs) rescan has found and turned on a hotplugged bridge.

[1] https://lists.ozlabs.org/pipermail/skiboot/2019-August/015140.html
[Skiboot] [PATCH v3 0/5] core/pci: Track changes of topology by an OS

Change since v5:
 - Activates on "ibm,supported-movable-bdfs" property in DT from skiboot
   instead of the "pci=realloc" flag;
 - Removed the code refactoring patches - will send them separately.

Changes since v4:
 - Fixed failing build when EEH is disabled in a kernel config;
 - Unfreeze the bus on EEH_IO_ERROR_VALUE(size), not only 0x;
 - Replaced the 0xff magic constant with phb->ioda.reserved_pe_idx;
 - Renamed create_pdn() -> pci_create_pdn_from_dev();
 - Renamed add_one_dev_pci_data(..., vf_index, ...) -> pci_alloc_pdn();
 - Renamed add_dev_pci_data() -> pci_create_vf_pdns();
 - Renamed remove_dev_pci_data() -> pci_destroy_vf_pdns();
 - Removed the patch fixing uninitialized IOMMU group - now it is fixed in
   commit 8f5b27347e88 ("powerpc/powernv/sriov: Register IOMMU groups for
   VFs")

Changes since v3:
 - Subject changed;
 - Don't disable EEH during rescan anymore - instead just unfreeze the
   target buses deliberately;
 - Add synchronization with the firmware when changing the PCIe topology;
 - Fixed for VFs;
 - Code cleanup.

Changes since v2:
 - Don't reassign bus numbers on PowerNV by default (to retain the default
   behavior), but only when pci=realloc is passed;
 - Less code affected;
 - pci_add_device_node_info is refactored with add_one_dev_pci_data;
 - Minor code cleanup.

Changes since v1:
 - Fixed build for ppc64le and ppc64be when CONFIG_PCI_IOV is disabled;
 - Fixed build for ppc64e when CONFIG_EEH is disabled;
 - Fixed code style warnings.

Sergey Miroshnichenko (5):
  powerpc/pci: Access PCI config space directly w/o pci_dn
  powerpc/powernv/pci: Suppress an EEH error when reading an empty slot
  powerpc/pci: Create pci_dn on demand
  powerpc/powernv/pci: Hook up the writes to PCI_SECONDARY_BUS register
  powerpc/pci: Enable assigning bus numbers instead of reading them from
DT

 arch/powerpc/include/asm/ppc-pci.h   |   1 +
 arch/powerpc/kernel/pci_dn.c |  95 +++--
 arch/powerpc/kernel/rtas_pci.c   |  97 ++---
 arch/powerpc/platforms/powernv/eeh-powernv.c |   2 +-
 arch/powerpc/platforms/powernv/pci.c | 205 +--
 5 files changed, 331 insertions(+), 69 deletions(-)

-- 
2.21.0



Re: [PATCH 3/6] powerpc: Convert flush_icache_range & friends to C

2019-08-16 Thread Christophe Leroy




Le 15/08/2019 à 09:29, christophe leroy a écrit :



Le 15/08/2019 à 06:10, Alastair D'Silva a écrit :

From: Alastair D'Silva 

Similar to commit 22e9c88d486a
("powerpc/64: reuse PPC32 static inline flush_dcache_range()")
this patch converts flush_icache_range() to C, and reimplements the
following functions as wrappers around it:
__flush_dcache_icache
__flush_dcache_icache_phys


Not sure you can do that for __flush_dcache_icache_phys(), see detailed 
comments below




I just sent you an RFC patch that could be the way to convert 
__flush_dcache_icache_phys() to C.


Feel free to modify it as wished and include it in your series.

Christophe


[RFC PATCH] powerpc: Convert ____flush_dcache_icache_phys() to C

2019-08-16 Thread Christophe Leroy
Resulting code (8xx with 16 bytes per cacheline and 16k pages)

016c <__flush_dcache_icache_phys>:
 16c:   54 63 00 22 rlwinm  r3,r3,0,0,17
 170:   7d 20 00 a6 mfmsr   r9
 174:   39 40 04 00 li  r10,1024
 178:   55 28 07 34 rlwinm  r8,r9,0,28,26
 17c:   7c 67 1b 78 mr  r7,r3
 180:   7d 49 03 a6 mtctr   r10
 184:   7d 00 01 24 mtmsr   r8
 188:   4c 00 01 2c isync
 18c:   7c 00 18 6c dcbst   0,r3
 190:   38 63 00 10 addir3,r3,16
 194:   42 00 ff f8 bdnz18c <__flush_dcache_icache_phys+0x20>
 198:   7c 00 04 ac hwsync
 19c:   7d 49 03 a6 mtctr   r10
 1a0:   7c 00 3f ac icbi0,r7
 1a4:   38 e7 00 10 addir7,r7,16
 1a8:   42 00 ff f8 bdnz1a0 <__flush_dcache_icache_phys+0x34>
 1ac:   7c 00 04 ac hwsync
 1b0:   7d 20 01 24 mtmsr   r9
 1b4:   4c 00 01 2c isync
 1b8:   4e 80 00 20 blr

Signed-off-by: Christophe Leroy 
---
 This patch is on top of Alastair's series "powerpc: convert cache asm to C"
 Patch 3 of that series should touch __flush_dcache_icache_phys and this
 patch could come just after patch 3.

 arch/powerpc/include/asm/cacheflush.h |  8 +
 arch/powerpc/mm/mem.c | 55 ---
 2 files changed, 53 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/cacheflush.h 
b/arch/powerpc/include/asm/cacheflush.h
index 1826bf2cc137..bf4f2dc4eb76 100644
--- a/arch/powerpc/include/asm/cacheflush.h
+++ b/arch/powerpc/include/asm/cacheflush.h
@@ -47,6 +47,14 @@ void flush_icache_user_range(struct vm_area_struct *vma,
struct page *page, unsigned long addr,
int len);
 void flush_dcache_icache_page(struct page *page);
+#if defined(CONFIG_PPC32) && !defined(CONFIG_BOOKE)
+void __flush_dcache_icache_phys(unsigned long physaddr);
+#else
+static inline void __flush_dcache_icache_phys(unsigned long physaddr)
+{
+   BUG();
+}
+#endif
 
 /**
  * flush_dcache_range(): Write any modified data cache blocks out to memory 
and invalidate them.
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index 43be99de7c9a..43009f9227c4 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -402,6 +402,50 @@ void flush_dcache_page(struct page *page)
 }
 EXPORT_SYMBOL(flush_dcache_page);
 
+#if defined(CONFIG_PPC32) && !defined(CONFIG_BOOKE)
+void __flush_dcache_icache_phys(unsigned long physaddr)
+{
+   unsigned long bytes = l1_dcache_bytes();
+   unsigned long nb = PAGE_SIZE / bytes;
+   unsigned long addr = physaddr & PAGE_MASK;
+   unsigned long msr, msr0;
+   unsigned long loop1 = addr, loop2 = addr;
+
+   if (cpu_has_feature(CPU_FTR_COHERENT_ICACHE)) {
+   /* For a snooping icache, we still need a dummy icbi to purge 
all the
+* prefetched instructions from the ifetch buffers. We also 
need a sync
+* before the icbi to order the the actual stores to memory 
that might
+* have modified instructions with the icbi.
+*/
+   mb(); /* sync */
+   icbi((void *)addr);
+   mb(); /* sync */
+   isync();
+   return;
+   }
+   msr0 = mfmsr();
+   msr = msr0 & ~MSR_DR;
+   asm volatile(
+   "   mtctr %2;"
+   "   mtmsr %3;"
+   "   isync;"
+   "0: dcbst   0, %0;"
+   "   addi%0, %0, %4;"
+   "   bdnz0b;"
+   "   sync;"
+   "   mtctr %2;"
+   "1: icbi0, %1;"
+   "   addi%1, %1, %4;"
+   "   bdnz1b;"
+   "   sync;"
+   "   mtmsr %5;"
+   "   isync;"
+   : "+r" (loop1), "+r" (loop2)
+   : "r" (nb), "r" (msr), "i" (bytes), "r" (msr0)
+   : "ctr", "memory");
+}
+#endif
+
 void flush_dcache_icache_page(struct page *page)
 {
 #ifdef CONFIG_HUGETLB_PAGE
@@ -419,16 +463,7 @@ void flush_dcache_icache_page(struct page *page)
__flush_dcache_icache(start);
kunmap_atomic(start);
} else {
-   unsigned long msr = mfmsr();
-
-   /* Clear the DR bit so that we operate on physical
-* rather than virtual addresses
-*/
-   mtmsr(msr & ~(MSR_DR));
-
-   __flush_dcache_icache((void *)physaddr);
-
-   mtmsr(msr);
+   __flush_dcache_icache_phys(page_to_pfn(page) << PAGE_SHIFT);
}
 #endif
 }
-- 
2.13.3



[Bug 204371] BUG kmalloc-4k (Tainted: G W ): Object padding overwritten

2019-08-16 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=204371

--- Comment #35 from Christophe Leroy (christophe.le...@c-s.fr) ---
Le 16/08/2019 à 16:38, bugzilla-dae...@bugzilla.kernel.org a écrit :
> https://bugzilla.kernel.org/show_bug.cgi?id=204371
> 
> --- Comment #34 from Erhard F. (erhar...@mailbox.org) ---
> On Fri, 16 Aug 2019 08:22:31 +
> bugzilla-dae...@bugzilla.kernel.org wrote:
> 
>> https://bugzilla.kernel.org/show_bug.cgi?id=204371
>>
>> --- Comment #32 from Christophe Leroy (christophe.le...@c-s.fr) ---
>> Then see if the WARNING on kfree() in  btrfs_free_dummy_fs_info() is still
>> there.
> With latest changes there are no complaints of the kernel any longer. btrfs
> selftests pass, mounting and unmounting a btrfs partition works without any
> suspicious dmesg output.
> 

That's good news. Will you handle submitting the patch to BTRFS file 
system ?

-- 
You are receiving this mail because:
You are on the CC list for the bug.

RE: [PATCH 01/10] PCI: designware-ep: Add multiple PFs support for DWC

2019-08-16 Thread Xiaowei Bao


> -Original Message-
> From: Andrew Murray 
> Sent: 2019年8月16日 20:35
> To: Xiaowei Bao 
> Cc: jingooh...@gmail.com; gustavo.pimen...@synopsys.com;
> mark.rutl...@arm.com; shawn...@kernel.org; Leo Li
> ; kis...@ti.com; lorenzo.pieral...@arm.com;
> a...@arndb.de; gre...@linuxfoundation.org; M.h. Lian
> ; Roy Zang ;
> linux-...@vger.kernel.org; devicet...@vger.kernel.org;
> linux-ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
> linuxppc-dev@lists.ozlabs.org; Z.q. Hou 
> Subject: Re: [PATCH 01/10] PCI: designware-ep: Add multiple PFs support for
> DWC
> 
> On Fri, Aug 16, 2019 at 11:00:01AM +, Xiaowei Bao wrote:
> >
> >
> > > -Original Message-
> > > From: Andrew Murray 
> > > Sent: 2019年8月16日 17:45
> > > To: Xiaowei Bao 
> > > Cc: jingooh...@gmail.com; gustavo.pimen...@synopsys.com;
> > > mark.rutl...@arm.com; shawn...@kernel.org; Leo Li
> > > ; kis...@ti.com; lorenzo.pieral...@arm.com;
> > > a...@arndb.de; gre...@linuxfoundation.org; M.h. Lian
> > > ; Roy Zang ;
> > > linux-...@vger.kernel.org; devicet...@vger.kernel.org;
> > > linux-ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
> > > linuxppc-dev@lists.ozlabs.org; Z.q. Hou 
> > > Subject: Re: [PATCH 01/10] PCI: designware-ep: Add multiple PFs
> > > support for DWC
> > >
> > > On Fri, Aug 16, 2019 at 02:55:41AM +, Xiaowei Bao wrote:
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Andrew Murray 
> > > > > Sent: 2019年8月15日 19:32
> > > > > To: Xiaowei Bao 
> > > > > Cc: jingooh...@gmail.com; gustavo.pimen...@synopsys.com;
> > > > > bhelg...@google.com; robh...@kernel.org; mark.rutl...@arm.com;
> > > > > shawn...@kernel.org; Leo Li ; kis...@ti.com;
> > > > > lorenzo.pieral...@arm.com; a...@arndb.de;
> > > > > gre...@linuxfoundation.org; M.h. Lian ;
> > > > > Mingkai Hu ; Roy Zang ;
> > > > > linux-...@vger.kernel.org; devicet...@vger.kernel.org;
> > > > > linux-ker...@vger.kernel.org;
> > > > > linux-arm-ker...@lists.infradead.org;
> > > > > linuxppc-dev@lists.ozlabs.org
> > > > > Subject: Re: [PATCH 01/10] PCI: designware-ep: Add multiple PFs
> > > > > support for DWC
> > > > >
> > > > > On Thu, Aug 15, 2019 at 04:37:07PM +0800, Xiaowei Bao wrote:
> > > > > > Add multiple PFs support for DWC, different PF have different
> > > > > > config space, we use pf-offset property which get from the DTS
> > > > > > to access the different pF config space.
> > > > >
> > > > > Thanks for the patch. I haven't seen a cover letter for this
> > > > > series, is there one missing?
> > > > Maybe I miss, I will add you to review next time, thanks a lot for
> > > > your
> > > comments.
> > > > >
> > > > >
> > > > > >
> > > > > > Signed-off-by: Xiaowei Bao 
> > > > > > ---
> > > > > >  drivers/pci/controller/dwc/pcie-designware-ep.c |  97
> > > > > +-
> > > > > >  drivers/pci/controller/dwc/pcie-designware.c| 105
> > > > > ++--
> > > > > >  drivers/pci/controller/dwc/pcie-designware.h|  10 ++-
> > > > > >  include/linux/pci-epc.h |   1 +
> > > > > >  4 files changed, 164 insertions(+), 49 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c
> > > > > > b/drivers/pci/controller/dwc/pcie-designware-ep.c
> > > > > > index 2bf5a35..75e2955 100644
> > > > > > --- a/drivers/pci/controller/dwc/pcie-designware-ep.c
> > > > > > +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
> > > > > > @@ -19,12 +19,14 @@ void dw_pcie_ep_linkup(struct dw_pcie_ep
> > > *ep)
> > > > > > pci_epc_linkup(epc);
> > > > > >  }
> > > > > >
> > > > > > -static void __dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum
> > > > > > pci_barno
> > > > > bar,
> > > > > > -  int flags)
> > > > > > +static void __dw_pcie_ep_reset_bar(struct dw_pcie *pci, u8
> func_no,
> > > > > > +  enum pci_barno bar, int flags)
> > > > > >  {
> > > > > > u32 reg;
> > > > > > +   struct pci_epc *epc = pci->ep.epc;
> > > > > > +   u32 pf_base = func_no * epc->pf_offset;
> > > > > >
> > > > > > -   reg = PCI_BASE_ADDRESS_0 + (4 * bar);
> > > > > > +   reg = pf_base + PCI_BASE_ADDRESS_0 + (4 * bar);
> > > > >
> > > > > I think I'd rather see this arithmetic (and the one for
> > > > > determining
> > > > > pf_base) inside a macro or inline header function. This would
> > > > > make this code more readable and reduce the chances of an error
> > > > > by avoiding
> > > duplication of code.
> > > > >
> > > > > For example look at cdns_pcie_ep_fn_writeb and
> > > > > ROCKCHIP_PCIE_EP_FUNC_BASE for examples of other EP drivers that
> > > > > do this.
> > > > Agree, this looks fine, thanks a lot for your comments, I will use
> > > > this way to access the registers in next version patch.
> > > > >
> > > > >
> > > > > > dw_pcie_dbi_ro_wr_en(pci);
> > > > > > dw_pcie_writel_dbi2(pci, reg, 0x0);
> > > > > > dw_pcie_writel_dbi(pci, reg, 0x0); @@ -37,7 +39,12 @@ static
> > > > > > 

Re: 5.2.7 kernel doesn't boot on G5

2019-08-16 Thread Christian Marillat
On 16 août 2019 16:05, Andreas Schwab  wrote:

> On Aug 16 2019, Christian Marillat  wrote:
>
>> On 15 août 2019 19:50, christophe leroy  wrote:
>>
>> [...]
>>
>>> Can you test with latest stable version, ie 5.2.8 ?
>>
>> Built from my G5 with make-kpkg and still doesn't boot :
>
> FWIW, 5.2.0 is working fine on my G5 (PowerMac7,3).

Mine is a PowerMac11,2 "Quadcore" and / is on a RAID0

As 4.19.5 boot I don't think is a hardware problem.

Christian


[PATCH] powerpc/vdso32: inline __get_datapage()

2019-08-16 Thread Christophe Leroy
__get_datapage() is only a few instructions to retrieve the
address of the page where the kernel stores data to the VDSO.

By inlining this function into its users, a bl/blr pair and
a mflr/mtlr pair is avoided, plus a few reg moves.

The improvement is noticeable (about 55 nsec/call on an 8xx)

vdsotest before the patch:
gettimeofday:vdso: 731 nsec/call
clock-gettime-realtime-coarse:vdso: 668 nsec/call
clock-gettime-monotonic-coarse:vdso: 745 nsec/call

vdsotest after the patch:
gettimeofday:vdso: 677 nsec/call
clock-gettime-realtime-coarse:vdso: 613 nsec/call
clock-gettime-monotonic-coarse:vdso: 690 nsec/call

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/kernel/vdso32/cacheflush.S   | 10 +-
 arch/powerpc/kernel/vdso32/datapage.S | 29 -
 arch/powerpc/kernel/vdso32/datapage.h | 12 
 arch/powerpc/kernel/vdso32/gettimeofday.S | 11 +--
 4 files changed, 26 insertions(+), 36 deletions(-)
 create mode 100644 arch/powerpc/kernel/vdso32/datapage.h

diff --git a/arch/powerpc/kernel/vdso32/cacheflush.S 
b/arch/powerpc/kernel/vdso32/cacheflush.S
index 7f882e7b9f43..e9453837e4ee 100644
--- a/arch/powerpc/kernel/vdso32/cacheflush.S
+++ b/arch/powerpc/kernel/vdso32/cacheflush.S
@@ -10,6 +10,8 @@
 #include 
 #include 
 
+#include "datapage.h"
+
.text
 
 /*
@@ -24,14 +26,12 @@ V_FUNCTION_BEGIN(__kernel_sync_dicache)
   .cfi_startproc
mflrr12
   .cfi_register lr,r12
-   mr  r11,r3
-   bl  __get_datapage@local
+   get_datapager10, r0
mtlrr12
-   mr  r10,r3
 
lwz r7,CFG_DCACHE_BLOCKSZ(r10)
addir5,r7,-1
-   andcr6,r11,r5   /* round low to line bdy */
+   andcr6,r3,r5/* round low to line bdy */
subfr8,r6,r4/* compute length */
add r8,r8,r5/* ensure we get enough */
lwz r9,CFG_DCACHE_LOGBLOCKSZ(r10)
@@ -48,7 +48,7 @@ V_FUNCTION_BEGIN(__kernel_sync_dicache)
 
lwz r7,CFG_ICACHE_BLOCKSZ(r10)
addir5,r7,-1
-   andcr6,r11,r5   /* round low to line bdy */
+   andcr6,r3,r5/* round low to line bdy */
subfr8,r6,r4/* compute length */
add r8,r8,r5
lwz r9,CFG_ICACHE_LOGBLOCKSZ(r10)
diff --git a/arch/powerpc/kernel/vdso32/datapage.S 
b/arch/powerpc/kernel/vdso32/datapage.S
index 6984125b9fc0..d480d2d4a3fe 100644
--- a/arch/powerpc/kernel/vdso32/datapage.S
+++ b/arch/powerpc/kernel/vdso32/datapage.S
@@ -11,34 +11,13 @@
 #include 
 #include 
 
+#include "datapage.h"
+
.text
.global __kernel_datapage_offset;
 __kernel_datapage_offset:
.long   0
 
-V_FUNCTION_BEGIN(__get_datapage)
-  .cfi_startproc
-   /* We don't want that exposed or overridable as we want other objects
-* to be able to bl directly to here
-*/
-   .protected __get_datapage
-   .hidden __get_datapage
-
-   mflrr0
-  .cfi_register lr,r0
-
-   bcl 20,31,data_page_branch
-data_page_branch:
-   mflrr3
-   mtlrr0
-   addir3, r3, __kernel_datapage_offset-data_page_branch
-   lwz r0,0(r3)
-  .cfi_restore lr
-   add r3,r0,r3
-   blr
-  .cfi_endproc
-V_FUNCTION_END(__get_datapage)
-
 /*
  * void *__kernel_get_syscall_map(unsigned int *syscall_count) ;
  *
@@ -53,7 +32,7 @@ V_FUNCTION_BEGIN(__kernel_get_syscall_map)
mflrr12
   .cfi_register lr,r12
mr  r4,r3
-   bl  __get_datapage@local
+   get_datapager3, r0
mtlrr12
addir3,r3,CFG_SYSCALL_MAP32
cmpli   cr0,r4,0
@@ -74,7 +53,7 @@ V_FUNCTION_BEGIN(__kernel_get_tbfreq)
   .cfi_startproc
mflrr12
   .cfi_register lr,r12
-   bl  __get_datapage@local
+   get_datapager3, r0
lwz r4,(CFG_TB_TICKS_PER_SEC + 4)(r3)
lwz r3,CFG_TB_TICKS_PER_SEC(r3)
mtlrr12
diff --git a/arch/powerpc/kernel/vdso32/datapage.h 
b/arch/powerpc/kernel/vdso32/datapage.h
new file mode 100644
index ..ad96256be090
--- /dev/null
+++ b/arch/powerpc/kernel/vdso32/datapage.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+.macro get_datapage ptr, tmp
+   bcl 20,31,888f
+888:
+   mflr\ptr
+   addi\ptr, \ptr, __kernel_datapage_offset - 888b
+   lwz \tmp, 0(\ptr)
+   add \ptr, \tmp, \ptr
+.endm
+
+
diff --git a/arch/powerpc/kernel/vdso32/gettimeofday.S 
b/arch/powerpc/kernel/vdso32/gettimeofday.S
index e10098cde89c..91a58f01dcd5 100644
--- a/arch/powerpc/kernel/vdso32/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso32/gettimeofday.S
@@ -12,6 +12,8 @@
 #include 
 #include 
 
+#include "datapage.h"
+
 /* Offset for the low 32-bit part of a field of long type */
 #ifdef CONFIG_PPC64
 #define LOPART 4
@@ -35,8 +37,7 @@ V_FUNCTION_BEGIN(__kernel_gettimeofday)
 

[Bug 204371] BUG kmalloc-4k (Tainted: G W ): Object padding overwritten

2019-08-16 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=204371

--- Comment #34 from Erhard F. (erhar...@mailbox.org) ---
On Fri, 16 Aug 2019 08:22:31 +
bugzilla-dae...@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=204371
> 
> --- Comment #32 from Christophe Leroy (christophe.le...@c-s.fr) ---
> Then see if the WARNING on kfree() in  btrfs_free_dummy_fs_info() is still
> there.
With latest changes there are no complaints of the kernel any longer. btrfs
selftests pass, mounting and unmounting a btrfs partition works without any
suspicious dmesg output.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[Bug 204371] BUG kmalloc-4k (Tainted: G W ): Object padding overwritten

2019-08-16 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=204371

--- Comment #33 from Erhard F. (erhar...@mailbox.org) ---
On Fri, 16 Aug 2019 08:22:31 +
bugzilla-dae...@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=204371
> 
> --- Comment #32 from Christophe Leroy (christophe.le...@c-s.fr) ---
> I think first thing is to fix test_add_free_space_entry() :
> - replace the map = kzalloc(...) by map = (void *)get_zeroed_page(...) like
> in
> other places.
> - replace the kfree(map); by free_page((unsigned long)map);
diff --git a/fs/btrfs/free-space-cache.c b/fs/btrfs/free-space-cache.c
index 062be9dde4c6..ed15645b4321 100644
--- a/fs/btrfs/free-space-cache.c
+++ b/fs/btrfs/free-space-cache.c
@@ -764,7 +764,7 @@ static int __load_free_space_cache(struct btrfs_root *root,
struct inode *inode,
} else {
ASSERT(num_bitmaps);
num_bitmaps--;
-   e->bitmap = kzalloc(PAGE_SIZE, GFP_NOFS);
+   e->bitmap = (void *)get_zeroed_page(GFP_NOFS);
if (!e->bitmap) {
kmem_cache_free(
btrfs_free_space_cachep, e);
@@ -1881,7 +1881,7 @@ static void free_bitmap(struct btrfs_free_space_ctl *ctl,
struct btrfs_free_space *bitmap_info)
 {
unlink_free_space(ctl, bitmap_info);
-   kfree(bitmap_info->bitmap);
+   free_page((unsigned long)bitmap_info->bitmap);
kmem_cache_free(btrfs_free_space_cachep, bitmap_info);
ctl->total_bitmaps--;
ctl->op->recalc_thresholds(ctl);
@@ -2135,7 +2135,7 @@ static int insert_into_bitmap(struct btrfs_free_space_ctl
*ctl,
}

/* allocate the bitmap */
-   info->bitmap = kzalloc(PAGE_SIZE, GFP_NOFS);
+   info->bitmap = (void *)get_zeroed_page(GFP_NOFS);
spin_lock(>tree_lock);
if (!info->bitmap) {
ret = -ENOMEM;
@@ -2146,7 +2146,7 @@ static int insert_into_bitmap(struct btrfs_free_space_ctl
*ctl,

 out:
if (info) {
-   kfree(info->bitmap);
+   free_page((unsigned long)info->bitmap);
kmem_cache_free(btrfs_free_space_cachep, info);
}

@@ -2802,7 +2802,7 @@ u64 btrfs_alloc_from_cluster(struct
btrfs_block_group_cache *block_group,
if (entry->bytes == 0) {
ctl->free_extents--;
if (entry->bitmap) {
-   kfree(entry->bitmap);
+   free_page((unsigned long)entry->bitmap);
ctl->total_bitmaps--;
ctl->op->recalc_thresholds(ctl);
}
@@ -3606,7 +3606,7 @@ int test_add_free_space_entry(struct
btrfs_block_group_cache *cache,
}

if (!map) {
-   map = kzalloc(PAGE_SIZE, GFP_NOFS);
+   map = (void *)get_zeroed_page(GFP_NOFS);
if (!map) {
kmem_cache_free(btrfs_free_space_cachep, info);
return -ENOMEM;
@@ -3635,7 +3635,7 @@ int test_add_free_space_entry(struct
btrfs_block_group_cache *cache,

if (info)
kmem_cache_free(btrfs_free_space_cachep, info);
-   kfree(map);
+   free_page((unsigned long)map);
return 0;
 }

-- 
You are receiving this mail because:
You are on the CC list for the bug.

[PATCH] powerpc: Set right value of Speculation_Store_Bypass in /proc//status

2019-08-16 Thread Gustavo Walbon
The issue has showed the value of status of Speculation_Store_Bypass in the
/proc//status as `unknown` for PowerPC systems.

The patch fix the checking of the mitigation status of Speculation, and
can be reported as "not vulnerable", "globally mitigated" or "vulnerable".

Link: https://github.com/linuxppc/issues/issues/255

Signed-off-by: Gustavo Walbon 
---
 arch/powerpc/kernel/security.c | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index e1c9cf079503..754ae4238d4e 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -14,7 +14,7 @@
 #include 
 #include 
 #include 
-
+#include 
 
 unsigned long powerpc_security_features __read_mostly = SEC_FTR_DEFAULT;
 
@@ -339,6 +339,29 @@ ssize_t cpu_show_spec_store_bypass(struct device *dev, 
struct device_attribute *
return sprintf(buf, "Vulnerable\n");
 }
 
+static int ssb_prctl_get(struct task_struct *task)
+{
+   if (stf_barrier) {
+   if (stf_enabled_flush_types == STF_BARRIER_NONE)
+   return PR_SPEC_NOT_AFFECTED;
+   else
+   return PR_SPEC_DISABLE;
+   } else
+   return PR_SPEC_DISABLE_NOEXEC;
+
+   return -EINVAL;
+}
+
+int arch_prctl_spec_ctrl_get(struct task_struct *task, unsigned long which)
+{
+   switch (which) {
+   case PR_SPEC_STORE_BYPASS:
+   return ssb_prctl_get(task);
+   default:
+   return -ENODEV;
+   }
+}
+
 #ifdef CONFIG_DEBUG_FS
 static int stf_barrier_set(void *data, u64 val)
 {
-- 
2.19.1



Re: 5.2.7 kernel doesn't boot on G5

2019-08-16 Thread Andreas Schwab
On Aug 16 2019, Christian Marillat  wrote:

> On 15 août 2019 19:50, christophe leroy  wrote:
>
> [...]
>
>> Can you test with latest stable version, ie 5.2.8 ?
>
> Built from my G5 with make-kpkg and still doesn't boot :

FWIW, 5.2.0 is working fine on my G5 (PowerMac7,3).

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."


[PATCH] crypto: vmx/xts - use fallback for ciphertext stealing

2019-08-16 Thread Ard Biesheuvel
For correctness and compliance with the XTS-AES specification, we are
adding support for ciphertext stealing to XTS implementations, even
though no use cases are known that will be enabled by this.

Since the Power8 implementation already has a fallback skcipher standby
for other purposes, let's use it for this purpose as well. If ciphertext
stealing use cases ever become a bottleneck, we can always revisit this.

Signed-off-by: Ard Biesheuvel 
---
 drivers/crypto/vmx/aes_xts.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/vmx/aes_xts.c b/drivers/crypto/vmx/aes_xts.c
index 49f7258045fa..d59e736882f6 100644
--- a/drivers/crypto/vmx/aes_xts.c
+++ b/drivers/crypto/vmx/aes_xts.c
@@ -84,7 +84,7 @@ static int p8_aes_xts_crypt(struct skcipher_request *req, int 
enc)
u8 tweak[AES_BLOCK_SIZE];
int ret;
 
-   if (!crypto_simd_usable()) {
+   if (!crypto_simd_usable() || (req->cryptlen % XTS_BLOCK_SIZE) != 0) {
struct skcipher_request *subreq = skcipher_request_ctx(req);
 
*subreq = *req;
-- 
2.17.1



[PATCH] powerpc/vdso32: Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE

2019-08-16 Thread Christophe Leroy
This is copied and adapted from commit 5c929885f1bb ("powerpc/vdso64:
Add support for CLOCK_{REALTIME/MONOTONIC}_COARSE")
from Santosh Sivaraj 

Benchmark from vdsotest:
clock-gettime-realtime: syscall: 3601 nsec/call
clock-gettime-realtime:libc: 1072 nsec/call
clock-gettime-realtime:vdso: 931 nsec/call
clock-gettime-monotonic: syscall: 4034 nsec/call
clock-gettime-monotonic:libc: 1213 nsec/call
clock-gettime-monotonic:vdso: 1076 nsec/call
clock-gettime-realtime-coarse: syscall: 2722 nsec/call
clock-gettime-realtime-coarse:libc: 805 nsec/call
clock-gettime-realtime-coarse:vdso: 668 nsec/call
clock-gettime-monotonic-coarse: syscall: 2949 nsec/call
clock-gettime-monotonic-coarse:libc: 882 nsec/call
clock-gettime-monotonic-coarse:vdso: 745 nsec/call

Signed-off-by: Christophe Leroy 
Cc: Naveen N. Rao 
Cc: Santosh Sivaraj 
Link: https://github.com/linuxppc/issues/issues/41
---
 arch/powerpc/kernel/vdso32/gettimeofday.S | 48 ++-
 1 file changed, 47 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/vdso32/gettimeofday.S 
b/arch/powerpc/kernel/vdso32/gettimeofday.S
index becd9f8767ed..e10098cde89c 100644
--- a/arch/powerpc/kernel/vdso32/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso32/gettimeofday.S
@@ -71,6 +71,12 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime)
cmpli   cr0,r3,CLOCK_REALTIME
cmpli   cr1,r3,CLOCK_MONOTONIC
crorcr0*4+eq,cr0*4+eq,cr1*4+eq
+
+   cmpli   cr5,r3,CLOCK_REALTIME_COARSE
+   cmpli   cr6,r3,CLOCK_MONOTONIC_COARSE
+   crorcr5*4+eq,cr5*4+eq,cr6*4+eq
+
+   crorcr0*4+eq,cr0*4+eq,cr5*4+eq
bne cr0,99f
 
mflrr12 /* r12 saves lr */
@@ -80,6 +86,7 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime)
mr  r9,r3   /* datapage ptr in r9 */
lis r7,NSEC_PER_SEC@h   /* want nanoseconds */
ori r7,r7,NSEC_PER_SEC@l
+   beq cr5,70f
 50:bl  __do_get_tspec@local/* get sec/nsec from tb & kernel */
bne cr1,80f /* not monotonic -> all done */
 
@@ -106,12 +113,51 @@ V_FUNCTION_BEGIN(__kernel_clock_gettime)
lwz r0,(CFG_TB_UPDATE_COUNT+LOPART)(r9)
 cmplcr0,r8,r0  /* check if updated */
bne-50b
+   b   78f
+
+   /*
+* For coarse clocks we get data directly from the vdso data page, so
+* we don't need to call __do_get_tspec, but we still need to do the
+* counter trick.
+*/
+70:lwz r8,(CFG_TB_UPDATE_COUNT+LOPART)(r9)
+   andi.   r0,r8,1 /* pending update ? loop */
+   bne-70b
+   add r9,r9,r0/* r0 is already 0 */
+
+   /*
+* CLOCK_REALTIME_COARSE, below values are needed for MONOTONIC_COARSE
+* too
+*/
+   lwz r3,STAMP_XTIME+TSPC32_TV_SEC(r9)
+   lwz r4,STAMP_XTIME+TSPC32_TV_NSEC(r9)
+   bne cr6,75f
+
+   /* CLOCK_MONOTONIC_COARSE */
+   lwz r5,(WTOM_CLOCK_SEC+LOPART)(r9)
+   lwz r6,WTOM_CLOCK_NSEC(r9)
+
+   /* check if counter has updated */
+   or  r0,r6,r5
+75:or  r0,r0,r3
+   or  r0,r0,r4
+   xor r0,r0,r0
+   add r3,r3,r0
+   lwz r0,CFG_TB_UPDATE_COUNT+LOPART(r9)
+   cmplcr0,r0,r8   /* check if updated */
+   bne-70b
+
+   /* Counter has not updated, so continue calculating proper values for
+* sec and nsec if monotonic coarse, or just return with the proper
+* values for realtime.
+*/
+   bne cr6,80f
 
/* Calculate and store result. Note that this mimics the C code,
 * which may cause funny results if nsec goes negative... is that
 * possible at all ?
 */
-   add r3,r3,r5
+78:add r3,r3,r5
add r4,r4,r6
cmpwcr0,r4,r7
cmpwi   cr1,r4,0
-- 
2.13.3



Re: 5.2.7 kernel doesn't boot on G5

2019-08-16 Thread Christian Marillat
On 15 août 2019 19:50, christophe leroy  wrote:

[...]

> Can you test with latest stable version, ie 5.2.8 ?

Built from my G5 with make-kpkg and still doesn't boot :

https://www.deb-multimedia.org/tests/20190816_142333.jpg

Christian


Re: [PATCH] powerpc/futex: fix warning: 'oldval' may be used uninitialized in this function

2019-08-16 Thread Michael Ellerman
Christophe Leroy  writes:
>   CC  kernel/futex.o
> kernel/futex.c: In function 'do_futex':
> kernel/futex.c:1676:17: warning: 'oldval' may be used uninitialized in this 
> function [-Wmaybe-uninitialized]
>return oldval == cmparg;
>  ^
> kernel/futex.c:1651:6: note: 'oldval' was declared here
>   int oldval, ret;
>   ^
>
> This is because arch_futex_atomic_op_inuser() only sets *oval
> if ret is NUL and GCC doesn't see that it will use it only when

I prefer 0 to "NUL", as ret is an int. I'll reword it. But otherwise
this looks OK.

cheers

> ret is NUL.
>
> Anyway, the non-NUL ret path is an error path that won't suffer from
> setting *oval, and as *oval is a local var in futex_atomic_op_inuser()
> it will have no impact.
>
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/include/asm/futex.h | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/futex.h 
> b/arch/powerpc/include/asm/futex.h
> index 3a6aa57b9d90..eea28ca679db 100644
> --- a/arch/powerpc/include/asm/futex.h
> +++ b/arch/powerpc/include/asm/futex.h
> @@ -60,8 +60,7 @@ static inline int arch_futex_atomic_op_inuser(int op, int 
> oparg, int *oval,
>  
>   pagefault_enable();
>  
> - if (!ret)
> - *oval = oldval;
> + *oval = oldval;
>  
>   prevent_write_to_user(uaddr, sizeof(*uaddr));
>   return ret;
> -- 
> 2.13.3


Re: [PATCH 01/10] PCI: designware-ep: Add multiple PFs support for DWC

2019-08-16 Thread Andrew Murray
On Fri, Aug 16, 2019 at 11:00:01AM +, Xiaowei Bao wrote:
> 
> 
> > -Original Message-
> > From: Andrew Murray 
> > Sent: 2019年8月16日 17:45
> > To: Xiaowei Bao 
> > Cc: jingooh...@gmail.com; gustavo.pimen...@synopsys.com;
> > mark.rutl...@arm.com; shawn...@kernel.org; Leo Li
> > ; kis...@ti.com; lorenzo.pieral...@arm.com;
> > a...@arndb.de; gre...@linuxfoundation.org; M.h. Lian
> > ; Roy Zang ;
> > linux-...@vger.kernel.org; devicet...@vger.kernel.org;
> > linux-ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
> > linuxppc-dev@lists.ozlabs.org; Z.q. Hou 
> > Subject: Re: [PATCH 01/10] PCI: designware-ep: Add multiple PFs support for
> > DWC
> > 
> > On Fri, Aug 16, 2019 at 02:55:41AM +, Xiaowei Bao wrote:
> > >
> > >
> > > > -Original Message-
> > > > From: Andrew Murray 
> > > > Sent: 2019年8月15日 19:32
> > > > To: Xiaowei Bao 
> > > > Cc: jingooh...@gmail.com; gustavo.pimen...@synopsys.com;
> > > > bhelg...@google.com; robh...@kernel.org; mark.rutl...@arm.com;
> > > > shawn...@kernel.org; Leo Li ; kis...@ti.com;
> > > > lorenzo.pieral...@arm.com; a...@arndb.de;
> > > > gre...@linuxfoundation.org; M.h. Lian ;
> > > > Mingkai Hu ; Roy Zang ;
> > > > linux-...@vger.kernel.org; devicet...@vger.kernel.org;
> > > > linux-ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
> > > > linuxppc-dev@lists.ozlabs.org
> > > > Subject: Re: [PATCH 01/10] PCI: designware-ep: Add multiple PFs
> > > > support for DWC
> > > >
> > > > On Thu, Aug 15, 2019 at 04:37:07PM +0800, Xiaowei Bao wrote:
> > > > > Add multiple PFs support for DWC, different PF have different
> > > > > config space, we use pf-offset property which get from the DTS to
> > > > > access the different pF config space.
> > > >
> > > > Thanks for the patch. I haven't seen a cover letter for this series,
> > > > is there one missing?
> > > Maybe I miss, I will add you to review next time, thanks a lot for your
> > comments.
> > > >
> > > >
> > > > >
> > > > > Signed-off-by: Xiaowei Bao 
> > > > > ---
> > > > >  drivers/pci/controller/dwc/pcie-designware-ep.c |  97
> > > > +-
> > > > >  drivers/pci/controller/dwc/pcie-designware.c| 105
> > > > ++--
> > > > >  drivers/pci/controller/dwc/pcie-designware.h|  10 ++-
> > > > >  include/linux/pci-epc.h |   1 +
> > > > >  4 files changed, 164 insertions(+), 49 deletions(-)
> > > > >
> > > > > diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c
> > > > > b/drivers/pci/controller/dwc/pcie-designware-ep.c
> > > > > index 2bf5a35..75e2955 100644
> > > > > --- a/drivers/pci/controller/dwc/pcie-designware-ep.c
> > > > > +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
> > > > > @@ -19,12 +19,14 @@ void dw_pcie_ep_linkup(struct dw_pcie_ep
> > *ep)
> > > > >   pci_epc_linkup(epc);
> > > > >  }
> > > > >
> > > > > -static void __dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum
> > > > > pci_barno
> > > > bar,
> > > > > -int flags)
> > > > > +static void __dw_pcie_ep_reset_bar(struct dw_pcie *pci, u8 func_no,
> > > > > +enum pci_barno bar, int flags)
> > > > >  {
> > > > >   u32 reg;
> > > > > + struct pci_epc *epc = pci->ep.epc;
> > > > > + u32 pf_base = func_no * epc->pf_offset;
> > > > >
> > > > > - reg = PCI_BASE_ADDRESS_0 + (4 * bar);
> > > > > + reg = pf_base + PCI_BASE_ADDRESS_0 + (4 * bar);
> > > >
> > > > I think I'd rather see this arithmetic (and the one for determining
> > > > pf_base) inside a macro or inline header function. This would make
> > > > this code more readable and reduce the chances of an error by avoiding
> > duplication of code.
> > > >
> > > > For example look at cdns_pcie_ep_fn_writeb and
> > > > ROCKCHIP_PCIE_EP_FUNC_BASE for examples of other EP drivers that do
> > > > this.
> > > Agree, this looks fine, thanks a lot for your comments, I will use
> > > this way to access the registers in next version patch.
> > > >
> > > >
> > > > >   dw_pcie_dbi_ro_wr_en(pci);
> > > > >   dw_pcie_writel_dbi2(pci, reg, 0x0);
> > > > >   dw_pcie_writel_dbi(pci, reg, 0x0); @@ -37,7 +39,12 @@ static
> > > > > void __dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno
> > > > > bar,
> > > > >
> > > > >  void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno bar)
> > {
> > > > > - __dw_pcie_ep_reset_bar(pci, bar, 0);
> > > > > + u8 func_no, funcs;
> > > > > +
> > > > > + funcs = pci->ep.epc->max_functions;
> > > > > +
> > > > > + for (func_no = 0; func_no < funcs; func_no++)
> > > > > + __dw_pcie_ep_reset_bar(pci, func_no, bar, 0);
> > > > >  }
> > > > >
> > > > >  static u8 __dw_pcie_ep_find_next_cap(struct dw_pcie *pci, u8
> > > > > cap_ptr, @@ -78,28 +85,29 @@ static int
> > > > > dw_pcie_ep_write_header(struct pci_epc *epc, u8 func_no,  {
> > > > >   struct dw_pcie_ep *ep = epc_get_drvdata(epc);
> > > > >   struct dw_pcie *pci = 

Re: Oops in blk_mq_get_request() (was Re: ppc64le kernel panic on 5.2.9-rc1)

2019-08-16 Thread Ming Lei
On Fri, Aug 16, 2019 at 7:15 PM Michael Ellerman  wrote:
>
> Major Hayden  writes:
> > Hello there,
> >
> > The CKI Project just found a kernel panic while running the blktests
> > test suite on stable 5.2.9-rc1[0]. Michael Ellerman requested for this
> > list to be copied on these ppc64le failures.
> >
> > We have some logs[1] for these failures and they start with
> > "ppc64le_host_2_Storage_blktests*". We hope this helps!
> >
> > [0] 
> > https://lore.kernel.org/stable/255f9af4-6087-7f56-5860-5aa0397a7...@redhat.com/T/#t
> > [1] https://artifacts.cki-project.org/pipelines/100875/logs/
>
> Thanks for the report.
>
> It looks like you tested the stable queue yesterday, which AFAICS
> results in the exact same source tree as you tested above, and yet
> yesterday you didn't see the failure. So it's intermittent, which is
> annoying.
>
> Looking at the oops:
>
> [ 7101.930385] NIP:  c067b230 LR: c067b1d4 CTR: 
> c0029140
> [ 7101.930400] REGS: c00020391ccc35c0 TRAP: 0300   Not tainted  
> (5.2.9-rc1-2440e48.cki)
> [ 7101.930413] MSR:  9280b033   
> CR: 44002228  XER: 2004
> [ 7101.930433] CFAR: c01d9e28 DAR: 800a00066b9e7e28 DSISR: 4000 
> IRQMASK: 0
>GPR00: c067b1d4 c00020391ccc3850 c16cdb00 
> 067578a4ab31
>GPR04:   01f3fb0e 
> 0001
>GPR08: 800a00066b9e7d80 800a00066b9e7d80 0001 
> c0080cd5db88
>r9
>GPR12: c0029140 c000203fff6b9800 0008 
> c0080d5751e0
>GPR16: c000203985a8a6c0 c0080e013278 c000203985a8a700 
> 0038
>GPR20: 0030 0028 0020 
> f000
>GPR24: 0001 0400  
> 0023
>GPR28:   c00020391ccc38c8 
> c03ef1b0
> [ 7101.930544] NIP [c067b230] blk_mq_get_request+0x260/0x4b0
> [ 7101.930557] LR [c067b1d4] blk_mq_get_request+0x204/0x4b0
> [ 7101.930569] Call Trace:
> [ 7101.930577] [c00020391ccc3850] [c067b1d4] 
> blk_mq_get_request+0x204/0x4b0 (unreliable)
> [ 7101.930594] [c00020391ccc38a0] [c067b688] 
> blk_mq_alloc_request_hctx+0x108/0x1b0
> [ 7101.930617] [c00020391ccc3910] [c0080cd51aac] 
> nvme_alloc_request+0x54/0xe0 [nvme_core]
> [ 7101.930633] [c00020391ccc3940] [c0080cd5641c] 
> __nvme_submit_sync_cmd+0x64/0x290 [nvme_core]
> [ 7101.930651] [c00020391ccc39c0] [c0080d571650] 
> nvmf_connect_io_queue+0x148/0x1e0 [nvme_fabrics]
> [ 7101.930668] [c00020391ccc3ab0] [c0080e0106b0] 
> nvme_loop_connect_io_queues+0x98/0xf8 [nvme_loop]
> [ 7101.930684] [c00020391ccc3af0] [c0080e01116c] 
> nvme_loop_create_ctrl+0x434/0x6a0 [nvme_loop]
> [ 7101.930700] [c00020391ccc3bd0] [c0080d5724f0] 
> nvmf_dev_write+0xd38/0x124c [nvme_fabrics]
> [ 7101.930719] [c00020391ccc3d60] [c0421e58] __vfs_write+0x38/0x70
> [ 7101.930731] [c00020391ccc3d80] [c0426188] vfs_write+0xd8/0x250
> [ 7101.930744] [c00020391ccc3dd0] [c0426558] ksys_write+0x78/0x130
> [ 7101.930758] [c00020391ccc3e20] [c000bde4] system_call+0x5c/0x70
>
> And then the disassembly:
>
> x = op_is_sync(op)
> r27 = op = 0x0023
> c067b1e0:   3e 06 67 57 clrlwi  r7,r27,24   # r7 = r27 & 
> REQ_OP_MASK
>
> c067b1e4:   00 00 07 2c cmpwi   r7,0# if r7 == 
> REQ_OP_READ, x = 1 then goto label2
> x = 1
> c067b1e8:   01 00 20 39 li  r9,1
>
> ...
>
> r30 = data = c00020391ccc38c8
> r8 = data->ctx = 800a00066b9e7d80
> c067b208:   18 00 1e e9 ld  r8,24(r30)
> c067b20c:   18 00 82 41 beq c067b224 
>  ->  label2
>
> (op & (REQ_SYNC | REQ_FUA | REQ_PREFLUSH))
> c067b210:   1c 05 6a 57 rlwinm  r10,r27,0,20,14
> c067b214:   68 03 4a 55 rlwinm  r10,r10,0,13,20
> c067b218:   34 00 4a 7d cntlzw  r10,r10
> c067b21c:   7e d9 4a 55 rlwinm  r10,r10,27,5,31
> c067b220:   01 00 49 69 xorir9,r10,1
>
> c067b224:   24 1f 29 79 rldicr  r9,r9,3,60
>   <-  label2
> r9 = x * 8
>
> c067b228:   01 00 e0 38 li  r7,1# for refcount_set
>
> r9 = data->ctx + x
> c067b22c:   14 4a 28 7d add r9,r8,r9
>
> data->ctx->rq_dispatched[op_is_sync(op)]++;
>
> r10 = data->ctx->rq_dispatched[x]
> c067b230:   a8 00 49 e9 ld  r10,168(r9)   
>   <-  NIP
>
> x++
> c067b234:   01 00 4a 39 addir10,r10,1
>
> data->ctx->rq_dispatched[x] = r10
> c067b238:   a8 00 49 f9 std r10,168(r9)
>
> refcount_set(>ref, 1);
> 

Re: [PATCH] ASoC: imx-audmux: Add driver suspend and resume to support MEGA Fast

2019-08-16 Thread Mark Brown
On Fri, Aug 16, 2019 at 01:03:14AM -0400, Shengjiu Wang wrote:

> + for (i = 0; i < reg_max; i++)
> + regcache[i] = readl(audmux_base + i * 4);

If only there were some framework which provided a register cache!  :P


signature.asc
Description: PGP signature


Applied "ASoC: imx-audmux: Add driver suspend and resume to support MEGA Fast" to the asoc tree

2019-08-16 Thread Mark Brown
The patch

   ASoC: imx-audmux: Add driver suspend and resume to support MEGA Fast

has been applied to the asoc tree at

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-5.3

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From 8661ab5b23d6d30d8687fc05bc1dba8f9a64b444 Mon Sep 17 00:00:00 2001
From: Shengjiu Wang 
Date: Fri, 16 Aug 2019 01:03:14 -0400
Subject: [PATCH] ASoC: imx-audmux: Add driver suspend and resume to support
 MEGA Fast

For i.MX6 SoloX, there is a mode of the SoC to shutdown all power
source of modules during system suspend and resume procedure.
Thus, AUDMUX needs to save all the values of registers before the
system suspend and restore them after the system resume.

Signed-off-by: Shengjiu Wang 
Link: 
https://lore.kernel.org/r/1565931794-7218-1-git-send-email-shengjiu.w...@nxp.com
Signed-off-by: Mark Brown 
---
 sound/soc/fsl/imx-audmux.c | 54 +-
 1 file changed, 53 insertions(+), 1 deletion(-)

diff --git a/sound/soc/fsl/imx-audmux.c b/sound/soc/fsl/imx-audmux.c
index b2351cd33b0f..16ede3b5cb32 100644
--- a/sound/soc/fsl/imx-audmux.c
+++ b/sound/soc/fsl/imx-audmux.c
@@ -23,6 +23,8 @@
 
 static struct clk *audmux_clk;
 static void __iomem *audmux_base;
+static u32 *regcache;
+static u32 reg_max;
 
 #define IMX_AUDMUX_V2_PTCR(x)  ((x) * 8)
 #define IMX_AUDMUX_V2_PDCR(x)  ((x) * 8 + 4)
@@ -317,8 +319,23 @@ static int imx_audmux_probe(struct platform_device *pdev)
if (of_id)
pdev->id_entry = of_id->data;
audmux_type = pdev->id_entry->driver_data;
-   if (audmux_type == IMX31_AUDMUX)
+
+   switch (audmux_type) {
+   case IMX31_AUDMUX:
audmux_debugfs_init();
+   reg_max = 14;
+   break;
+   case IMX21_AUDMUX:
+   reg_max = 6;
+   break;
+   default:
+   dev_err(>dev, "unsupported version!\n");
+   return -EINVAL;
+   }
+
+   regcache = devm_kzalloc(>dev, sizeof(u32) * reg_max, GFP_KERNEL);
+   if (!regcache)
+   return -ENOMEM;
 
if (of_id)
imx_audmux_parse_dt_defaults(pdev, pdev->dev.of_node);
@@ -334,12 +351,47 @@ static int imx_audmux_remove(struct platform_device *pdev)
return 0;
 }
 
+#ifdef CONFIG_PM_SLEEP
+static int imx_audmux_suspend(struct device *dev)
+{
+   int i;
+
+   clk_prepare_enable(audmux_clk);
+
+   for (i = 0; i < reg_max; i++)
+   regcache[i] = readl(audmux_base + i * 4);
+
+   clk_disable_unprepare(audmux_clk);
+
+   return 0;
+}
+
+static int imx_audmux_resume(struct device *dev)
+{
+   int i;
+
+   clk_prepare_enable(audmux_clk);
+
+   for (i = 0; i < reg_max; i++)
+   writel(regcache[i], audmux_base + i * 4);
+
+   clk_disable_unprepare(audmux_clk);
+
+   return 0;
+}
+#endif /* CONFIG_PM_SLEEP */
+
+static const struct dev_pm_ops imx_audmux_pm = {
+   SET_SYSTEM_SLEEP_PM_OPS(imx_audmux_suspend, imx_audmux_resume)
+};
+
 static struct platform_driver imx_audmux_driver = {
.probe  = imx_audmux_probe,
.remove = imx_audmux_remove,
.id_table   = imx_audmux_ids,
.driver = {
.name   = DRIVER_NAME,
+   .pm = _audmux_pm,
.of_match_table = imx_audmux_dt_ids,
}
 };
-- 
2.20.1



[PATCH v2] powerpc/32: Add VDSO version of getcpu

2019-08-16 Thread Christophe Leroy
Commit 18ad51dd342a ("powerpc: Add VDSO version of getcpu") added
getcpu() for PPC64 only, by making use of a user readable general
purpose SPR.

PPC32 doesn't have any such SPR, a full system call can still be
avoided by implementing a fast system call which reads the CPU id
in the task struct and returns immediately without going back in
virtual mode.

Before the patch, vdsotest reported:
getcpu: syscall: 1572 nsec/call
getcpu:libc: 1787 nsec/call
getcpu:vdso: not tested

Now, vdsotest reports:
getcpu: syscall: 1582 nsec/call
getcpu:libc: 667 nsec/call
getcpu:vdso: 368 nsec/call

For non SMP, just return CPU id 0 from the VDSO directly.

PPC32 doesn't support CONFIG_NUMA so NUMA node is always 0.

Signed-off-by: Christophe Leroy 

---
v2: fixed build error in getcpu.S
---
 arch/powerpc/include/asm/vdso.h |  2 ++
 arch/powerpc/kernel/head_32.h   | 13 +
 arch/powerpc/kernel/head_booke.h| 11 +++
 arch/powerpc/kernel/vdso32/Makefile |  4 +---
 arch/powerpc/kernel/vdso32/getcpu.S |  7 +++
 arch/powerpc/kernel/vdso32/vdso32.lds.S |  2 --
 6 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/vdso.h b/arch/powerpc/include/asm/vdso.h
index b5e1f8f8a05c..adb54782df5f 100644
--- a/arch/powerpc/include/asm/vdso.h
+++ b/arch/powerpc/include/asm/vdso.h
@@ -16,6 +16,8 @@
 /* Define if 64 bits VDSO has procedure descriptors */
 #undef VDS64_HAS_DESCRIPTORS
 
+#define NR_MAGIC_FAST_VDSO_SYSCALL 0x789a
+
 #ifndef __ASSEMBLY__
 
 /* Offsets relative to thread->vdso_base */
diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h
index 4a692553651f..a2e38b59785a 100644
--- a/arch/powerpc/kernel/head_32.h
+++ b/arch/powerpc/kernel/head_32.h
@@ -3,6 +3,8 @@
 #define __HEAD_32_H__
 
 #include /* for STACK_FRAME_REGS_MARKER */
+#include 
+#include 
 
 /*
  * MSR_KERNEL is > 0x8000 on 4xx/Book-E since it include MSR_CE.
@@ -74,7 +76,13 @@
 .endm
 
 .macro SYSCALL_ENTRY trapno
+#ifdef CONFIG_SMP
+   cmplwi  cr0, r0, NR_MAGIC_FAST_VDSO_SYSCALL
+#endif
mfspr   r12,SPRN_SPRG_THREAD
+#ifdef CONFIG_SMP
+   beq-1f
+#endif
mfcrr10
lwz r11,TASK_STACK-THREAD(r12)
mflrr9
@@ -152,6 +160,11 @@
mtspr   SPRN_SRR0,r11
SYNC
RFI /* jump to handler, enable MMU */
+#ifdef CONFIG_SMP
+1:
+   lwz r5, TASK_CPU - THREAD(r12)
+   RFI
+#endif
 .endm
 
 /*
diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h
index 2ae635df9026..c534e87cac84 100644
--- a/arch/powerpc/kernel/head_booke.h
+++ b/arch/powerpc/kernel/head_booke.h
@@ -3,6 +3,8 @@
 #define __HEAD_BOOKE_H__
 
 #include /* for STACK_FRAME_REGS_MARKER */
+#include 
+#include 
 #include 
 #include 
 
@@ -104,6 +106,10 @@ FTR_SECTION_ELSE
 #ifdef CONFIG_KVM_BOOKE_HV
 ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV)
 #endif
+#ifdef CONFIG_SMP
+   cmplwi  cr0, r0, NR_MAGIC_FAST_VDSO_SYSCALL
+   beq-1f
+#endif
BOOKE_CLEAR_BTB(r11)
lwz r11, TASK_STACK - THREAD(r10)
rlwinm  r12,r12,0,4,2   /* Clear SO bit in CR */
@@ -176,6 +182,11 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV)
mtspr   SPRN_SRR0,r11
SYNC
RFI /* jump to handler, enable MMU */
+#ifdef CONFIG_SMP
+1:
+   lwz r5, TASK_CPU - THREAD(r10)
+   RFI
+#endif
 .endm
 
 /* To handle the additional exception priority levels on 40x and Book-E
diff --git a/arch/powerpc/kernel/vdso32/Makefile 
b/arch/powerpc/kernel/vdso32/Makefile
index 06f54d947057..e147bbdc12cd 100644
--- a/arch/powerpc/kernel/vdso32/Makefile
+++ b/arch/powerpc/kernel/vdso32/Makefile
@@ -2,9 +2,7 @@
 
 # List of files in the vdso, has to be asm only for now
 
-obj-vdso32-$(CONFIG_PPC64) = getcpu.o
-obj-vdso32 = sigtramp.o gettimeofday.o datapage.o cacheflush.o note.o \
-   $(obj-vdso32-y)
+obj-vdso32 = sigtramp.o gettimeofday.o datapage.o cacheflush.o note.o getcpu.o
 
 # Build rules
 
diff --git a/arch/powerpc/kernel/vdso32/getcpu.S 
b/arch/powerpc/kernel/vdso32/getcpu.S
index 63e914539e1a..bde226ad904d 100644
--- a/arch/powerpc/kernel/vdso32/getcpu.S
+++ b/arch/powerpc/kernel/vdso32/getcpu.S
@@ -17,7 +17,14 @@
  */
 V_FUNCTION_BEGIN(__kernel_getcpu)
   .cfi_startproc
+#if defined(CONFIG_PPC64)
mfspr   r5,SPRN_SPRG_VDSO_READ
+#elif defined(CONFIG_SMP)
+   li  r0, NR_MAGIC_FAST_VDSO_SYSCALL
+   sc  /* returns cpuid in r5, clobbers cr0 and r10-r13 */
+#else
+   li  r5, 0
+#endif
cmpwi   cr0,r3,0
cmpwi   cr1,r4,0
clrlwi  r6,r5,16
diff --git a/arch/powerpc/kernel/vdso32/vdso32.lds.S 
b/arch/powerpc/kernel/vdso32/vdso32.lds.S
index 099a6db14e67..663880671e20 100644
--- a/arch/powerpc/kernel/vdso32/vdso32.lds.S
+++ b/arch/powerpc/kernel/vdso32/vdso32.lds.S
@@ -152,9 +152,7 @@ VERSION
__kernel_sync_dicache_p5;

RE: [PATCH 02/10] PCI: designware-ep: Add the doorbell mode of MSI-X in EP mode

2019-08-16 Thread Xiaowei Bao


> -Original Message-
> From: Kishon Vijay Abraham I 
> Sent: 2019年8月16日 18:50
> To: Xiaowei Bao ; Andrew Murray
> 
> Cc: jingooh...@gmail.com; gustavo.pimen...@synopsys.com;
> bhelg...@google.com; robh...@kernel.org; mark.rutl...@arm.com;
> shawn...@kernel.org; Leo Li ;
> lorenzo.pieral...@arm.com; a...@arndb.de; gre...@linuxfoundation.org;
> M.h. Lian ; Mingkai Hu ;
> linux-...@vger.kernel.org; devicet...@vger.kernel.org;
> linux-ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
> linuxppc-dev@lists.ozlabs.org; Z.q. Hou 
> Subject: Re: [PATCH 02/10] PCI: designware-ep: Add the doorbell mode of
> MSI-X in EP mode
> 
> Hi,
> 
> On 16/08/19 8:28 AM, Xiaowei Bao wrote:
> >
> >
> >> -Original Message-
> >> From: Andrew Murray 
> >> Sent: 2019年8月15日 19:54
> >> To: Xiaowei Bao 
> >> Cc: jingooh...@gmail.com; gustavo.pimen...@synopsys.com;
> >> bhelg...@google.com; robh...@kernel.org; mark.rutl...@arm.com;
> >> shawn...@kernel.org; Leo Li ; kis...@ti.com;
> >> lorenzo.pieral...@arm.com; a...@arndb.de; gre...@linuxfoundation.org;
> >> M.h. Lian ; Mingkai Hu
> ;
> >> Roy Zang ; linux-...@vger.kernel.org;
> >> devicet...@vger.kernel.org; linux-ker...@vger.kernel.org;
> >> linux-arm-ker...@lists.infradead.org; linuxppc-dev@lists.ozlabs.org
> >> Subject: Re: [PATCH 02/10] PCI: designware-ep: Add the doorbell mode
> >> of MSI-X in EP mode
> >>
> >> On Thu, Aug 15, 2019 at 04:37:08PM +0800, Xiaowei Bao wrote:
> >>> Add the doorbell mode of MSI-X in EP mode.
> >>>
> >>> Signed-off-by: Xiaowei Bao 
> >>> ---
> >>>  drivers/pci/controller/dwc/pcie-designware-ep.c | 14
> ++
> >>>  drivers/pci/controller/dwc/pcie-designware.h| 14
> ++
> >>>  2 files changed, 28 insertions(+)
> >>>
> >>> diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c
> >>> b/drivers/pci/controller/dwc/pcie-designware-ep.c
> >>> index 75e2955..e3a7cdf 100644
> >>> --- a/drivers/pci/controller/dwc/pcie-designware-ep.c
> >>> +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
> >>> @@ -454,6 +454,20 @@ int dw_pcie_ep_raise_msi_irq(struct
> dw_pcie_ep
> >> *ep, u8 func_no,
> >>>   return 0;
> >>>  }
> >>>
> >>> +int dw_pcie_ep_raise_msix_irq_doorbell(struct dw_pcie_ep *ep, u8
> >> func_no,
> >>> +u16 interrupt_num)
> >>> +{
> >>> + struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> >>> + u32 msg_data;
> >>> +
> >>> + msg_data = (func_no << PCIE_MSIX_DOORBELL_PF_SHIFT) |
> >>> +(interrupt_num - 1);
> >>> +
> >>> + dw_pcie_writel_dbi(pci, PCIE_MSIX_DOORBELL, msg_data);
> >>> +
> >>> + return 0;
> >>> +}
> >>> +
> >>>  int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
> >>> u16 interrupt_num)
> >>
> >> Have I understood correctly that the hardware provides an alternative
> >> mechanism that allows for raising MSI-X interrupts without the bother
> >> of reading the capabilities registers?
> > Yes, the hardware provide two way to MSI-X, please check the page 492
> > of
> > DWC_pcie_dm_registers_4.30 Menu.
> > MSIX_DOORBELL_OFF on page 492 0x948 Description: MSI-X Doorbell
> > Register>
> >>
> >> If so is there any good reason to keep dw_pcie_ep_raise_msix_irq?
> >> (And thus use it in dw_plat_pcie_ep_raise_irq also)?
> > I am not sure, but I think the dw_pcie_ep_raise_msix_irq function is
> > not correct, because I think we can't get the MSIX table from the
> > address ep->phys_base + tbl_addr, but I also don't know where I can get the
> correct MSIX table.
> 
> Sometime back when I tried raising MSI-X from EP, it was failing. It's quite
> possible dw_pcie_ep_raise_msix_irq function is not correct.
> 
> MSI-X table can be obtained from the inbound ATU corresponding to the MSIX
> bar.
> IMO MSI-X support in EP mode needs rework. For instance set_msix should
> also take BAR number as input to be configured in the MSI-X capability. The
> function driver (pci-epf-test.c) should allocate memory taking into account 
> the
> MSI-X table.
Hi Kishon,

Thanks a lot for your explain, yes, we can get the MSI-X table from the inbound 
ATU of
the MSIX BAR.
> 
> Thanks
> Kishon


Oops in blk_mq_get_request() (was Re: ppc64le kernel panic on 5.2.9-rc1)

2019-08-16 Thread Michael Ellerman
Major Hayden  writes:
> Hello there,
>
> The CKI Project just found a kernel panic while running the blktests
> test suite on stable 5.2.9-rc1[0]. Michael Ellerman requested for this
> list to be copied on these ppc64le failures.
>
> We have some logs[1] for these failures and they start with
> "ppc64le_host_2_Storage_blktests*". We hope this helps!
>
> [0] 
> https://lore.kernel.org/stable/255f9af4-6087-7f56-5860-5aa0397a7...@redhat.com/T/#t
> [1] https://artifacts.cki-project.org/pipelines/100875/logs/

Thanks for the report.

It looks like you tested the stable queue yesterday, which AFAICS
results in the exact same source tree as you tested above, and yet
yesterday you didn't see the failure. So it's intermittent, which is
annoying.

Looking at the oops:

[ 7101.930385] NIP:  c067b230 LR: c067b1d4 CTR: c0029140
[ 7101.930400] REGS: c00020391ccc35c0 TRAP: 0300   Not tainted  
(5.2.9-rc1-2440e48.cki)
[ 7101.930413] MSR:  9280b033   CR: 
44002228  XER: 2004
[ 7101.930433] CFAR: c01d9e28 DAR: 800a00066b9e7e28 DSISR: 4000 
IRQMASK: 0 
   GPR00: c067b1d4 c00020391ccc3850 c16cdb00 
067578a4ab31 
   GPR04:   01f3fb0e 
0001 
   GPR08: 800a00066b9e7d80 800a00066b9e7d80 0001 
c0080cd5db88 
   r9
   GPR12: c0029140 c000203fff6b9800 0008 
c0080d5751e0 
   GPR16: c000203985a8a6c0 c0080e013278 c000203985a8a700 
0038 
   GPR20: 0030 0028 0020 
f000 
   GPR24: 0001 0400  
0023 
   GPR28:   c00020391ccc38c8 
c03ef1b0 
[ 7101.930544] NIP [c067b230] blk_mq_get_request+0x260/0x4b0
[ 7101.930557] LR [c067b1d4] blk_mq_get_request+0x204/0x4b0
[ 7101.930569] Call Trace:
[ 7101.930577] [c00020391ccc3850] [c067b1d4] 
blk_mq_get_request+0x204/0x4b0 (unreliable)
[ 7101.930594] [c00020391ccc38a0] [c067b688] 
blk_mq_alloc_request_hctx+0x108/0x1b0
[ 7101.930617] [c00020391ccc3910] [c0080cd51aac] 
nvme_alloc_request+0x54/0xe0 [nvme_core]
[ 7101.930633] [c00020391ccc3940] [c0080cd5641c] 
__nvme_submit_sync_cmd+0x64/0x290 [nvme_core]
[ 7101.930651] [c00020391ccc39c0] [c0080d571650] 
nvmf_connect_io_queue+0x148/0x1e0 [nvme_fabrics]
[ 7101.930668] [c00020391ccc3ab0] [c0080e0106b0] 
nvme_loop_connect_io_queues+0x98/0xf8 [nvme_loop]
[ 7101.930684] [c00020391ccc3af0] [c0080e01116c] 
nvme_loop_create_ctrl+0x434/0x6a0 [nvme_loop]
[ 7101.930700] [c00020391ccc3bd0] [c0080d5724f0] 
nvmf_dev_write+0xd38/0x124c [nvme_fabrics]
[ 7101.930719] [c00020391ccc3d60] [c0421e58] __vfs_write+0x38/0x70
[ 7101.930731] [c00020391ccc3d80] [c0426188] vfs_write+0xd8/0x250
[ 7101.930744] [c00020391ccc3dd0] [c0426558] ksys_write+0x78/0x130
[ 7101.930758] [c00020391ccc3e20] [c000bde4] system_call+0x5c/0x70

And then the disassembly:

x = op_is_sync(op)
r27 = op = 0x0023
c067b1e0:   3e 06 67 57 clrlwi  r7,r27,24   # r7 = r27 & 
REQ_OP_MASK

c067b1e4:   00 00 07 2c cmpwi   r7,0# if r7 == 
REQ_OP_READ, x = 1 then goto label2
x = 1
c067b1e8:   01 00 20 39 li  r9,1

...

r30 = data = c00020391ccc38c8 
r8 = data->ctx = 800a00066b9e7d80
c067b208:   18 00 1e e9 ld  r8,24(r30)
c067b20c:   18 00 82 41 beq c067b224 
 ->  label2

(op & (REQ_SYNC | REQ_FUA | REQ_PREFLUSH))
c067b210:   1c 05 6a 57 rlwinm  r10,r27,0,20,14
c067b214:   68 03 4a 55 rlwinm  r10,r10,0,13,20
c067b218:   34 00 4a 7d cntlzw  r10,r10
c067b21c:   7e d9 4a 55 rlwinm  r10,r10,27,5,31
c067b220:   01 00 49 69 xorir9,r10,1

c067b224:   24 1f 29 79 rldicr  r9,r9,3,60  
<-  label2
r9 = x * 8

c067b228:   01 00 e0 38 li  r7,1# for refcount_set

r9 = data->ctx + x
c067b22c:   14 4a 28 7d add r9,r8,r9

data->ctx->rq_dispatched[op_is_sync(op)]++;

r10 = data->ctx->rq_dispatched[x]
c067b230:   a8 00 49 e9 ld  r10,168(r9) 
<-  NIP

x++
c067b234:   01 00 4a 39 addir10,r10,1

data->ctx->rq_dispatched[x] = r10
c067b238:   a8 00 49 f9 std r10,168(r9)

refcount_set(>ref, 1);
c067b23c:   d4 00 ff 90 stw r7,212(r31)


So we're oopsing at data->ctx->rq_dispatched[op_is_sync(op)]++.

data->ctx looks completely bogus, ie. 800a00066b9e7d80, that's not
anything like a valid kernel address.

And also op doesn't look like a valid op value, it's 

[PATCH] powerpc/32: Add VDSO version of getcpu

2019-08-16 Thread Christophe Leroy
Commit 18ad51dd342a ("powerpc: Add VDSO version of getcpu") added
getcpu() for PPC64 only, by making use of a user readable general
purpose SPR.

PPC32 doesn't have any such SPR, a full system call can still be
avoided by implementing a fast system call which reads the CPU id
in the task struct and returns immediately without going back in
virtual mode.

Before the patch, vdsotest reported:
getcpu: syscall: 1572 nsec/call
getcpu:libc: 1787 nsec/call
getcpu:vdso: not tested

Now, vdsotest reports:
getcpu: syscall: 1582 nsec/call
getcpu:libc: 667 nsec/call
getcpu:vdso: 368 nsec/call

For non SMP, just return CPU id 0 from the VDSO directly.

PPC32 doesn't support CONFIG_NUMA so NUMA node is always 0.

Signed-off-by: Christophe Leroy 
---
 arch/powerpc/include/asm/vdso.h |  2 ++
 arch/powerpc/kernel/head_32.h   | 13 +
 arch/powerpc/kernel/head_booke.h| 11 +++
 arch/powerpc/kernel/vdso32/Makefile |  4 +---
 arch/powerpc/kernel/vdso32/getcpu.S |  7 +++
 arch/powerpc/kernel/vdso32/vdso32.lds.S |  2 --
 6 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/vdso.h b/arch/powerpc/include/asm/vdso.h
index b5e1f8f8a05c..adb54782df5f 100644
--- a/arch/powerpc/include/asm/vdso.h
+++ b/arch/powerpc/include/asm/vdso.h
@@ -16,6 +16,8 @@
 /* Define if 64 bits VDSO has procedure descriptors */
 #undef VDS64_HAS_DESCRIPTORS
 
+#define NR_MAGIC_FAST_VDSO_SYSCALL 0x789a
+
 #ifndef __ASSEMBLY__
 
 /* Offsets relative to thread->vdso_base */
diff --git a/arch/powerpc/kernel/head_32.h b/arch/powerpc/kernel/head_32.h
index 4a692553651f..a2e38b59785a 100644
--- a/arch/powerpc/kernel/head_32.h
+++ b/arch/powerpc/kernel/head_32.h
@@ -3,6 +3,8 @@
 #define __HEAD_32_H__
 
 #include /* for STACK_FRAME_REGS_MARKER */
+#include 
+#include 
 
 /*
  * MSR_KERNEL is > 0x8000 on 4xx/Book-E since it include MSR_CE.
@@ -74,7 +76,13 @@
 .endm
 
 .macro SYSCALL_ENTRY trapno
+#ifdef CONFIG_SMP
+   cmplwi  cr0, r0, NR_MAGIC_FAST_VDSO_SYSCALL
+#endif
mfspr   r12,SPRN_SPRG_THREAD
+#ifdef CONFIG_SMP
+   beq-1f
+#endif
mfcrr10
lwz r11,TASK_STACK-THREAD(r12)
mflrr9
@@ -152,6 +160,11 @@
mtspr   SPRN_SRR0,r11
SYNC
RFI /* jump to handler, enable MMU */
+#ifdef CONFIG_SMP
+1:
+   lwz r5, TASK_CPU - THREAD(r12)
+   RFI
+#endif
 .endm
 
 /*
diff --git a/arch/powerpc/kernel/head_booke.h b/arch/powerpc/kernel/head_booke.h
index 2ae635df9026..c534e87cac84 100644
--- a/arch/powerpc/kernel/head_booke.h
+++ b/arch/powerpc/kernel/head_booke.h
@@ -3,6 +3,8 @@
 #define __HEAD_BOOKE_H__
 
 #include /* for STACK_FRAME_REGS_MARKER */
+#include 
+#include 
 #include 
 #include 
 
@@ -104,6 +106,10 @@ FTR_SECTION_ELSE
 #ifdef CONFIG_KVM_BOOKE_HV
 ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV)
 #endif
+#ifdef CONFIG_SMP
+   cmplwi  cr0, r0, NR_MAGIC_FAST_VDSO_SYSCALL
+   beq-1f
+#endif
BOOKE_CLEAR_BTB(r11)
lwz r11, TASK_STACK - THREAD(r10)
rlwinm  r12,r12,0,4,2   /* Clear SO bit in CR */
@@ -176,6 +182,11 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_EMB_HV)
mtspr   SPRN_SRR0,r11
SYNC
RFI /* jump to handler, enable MMU */
+#ifdef CONFIG_SMP
+1:
+   lwz r5, TASK_CPU - THREAD(r10)
+   RFI
+#endif
 .endm
 
 /* To handle the additional exception priority levels on 40x and Book-E
diff --git a/arch/powerpc/kernel/vdso32/Makefile 
b/arch/powerpc/kernel/vdso32/Makefile
index 06f54d947057..e147bbdc12cd 100644
--- a/arch/powerpc/kernel/vdso32/Makefile
+++ b/arch/powerpc/kernel/vdso32/Makefile
@@ -2,9 +2,7 @@
 
 # List of files in the vdso, has to be asm only for now
 
-obj-vdso32-$(CONFIG_PPC64) = getcpu.o
-obj-vdso32 = sigtramp.o gettimeofday.o datapage.o cacheflush.o note.o \
-   $(obj-vdso32-y)
+obj-vdso32 = sigtramp.o gettimeofday.o datapage.o cacheflush.o note.o getcpu.o
 
 # Build rules
 
diff --git a/arch/powerpc/kernel/vdso32/getcpu.S 
b/arch/powerpc/kernel/vdso32/getcpu.S
index 63e914539e1a..bd67a0c25c86 100644
--- a/arch/powerpc/kernel/vdso32/getcpu.S
+++ b/arch/powerpc/kernel/vdso32/getcpu.S
@@ -17,7 +17,14 @@
  */
 V_FUNCTION_BEGIN(__kernel_getcpu)
   .cfi_startproc
+#if defined(CONFIG_PPC64)
mfspr   r5,SPRN_SPRG_VDSO_READ
+#elif defined (CONFIG_SMP)*/
+   li  r0, NR_MAGIC_FAST_VDSO_SYSCALL
+   sc  /* returns cpuid in r5, clobbers cr0 and r10-r13 */
+#else
+   li  r5, 0
+#endif
cmpwi   cr0,r3,0
cmpwi   cr1,r4,0
clrlwi  r6,r5,16
diff --git a/arch/powerpc/kernel/vdso32/vdso32.lds.S 
b/arch/powerpc/kernel/vdso32/vdso32.lds.S
index 099a6db14e67..663880671e20 100644
--- a/arch/powerpc/kernel/vdso32/vdso32.lds.S
+++ b/arch/powerpc/kernel/vdso32/vdso32.lds.S
@@ -152,9 +152,7 @@ VERSION
__kernel_sync_dicache_p5;
__kernel_sigtramp32;

RE: [PATCH 05/10] PCI: layerscape: Modify the way of getting capability with different PEX

2019-08-16 Thread Xiaowei Bao


> -Original Message-
> From: Andrew Murray 
> Sent: 2019年8月16日 18:26
> To: Xiaowei Bao 
> Cc: jingooh...@gmail.com; gustavo.pimen...@synopsys.com;
> bhelg...@google.com; robh...@kernel.org; mark.rutl...@arm.com;
> shawn...@kernel.org; Leo Li ; kis...@ti.com;
> lorenzo.pieral...@arm.com; a...@arndb.de; gre...@linuxfoundation.org;
> M.h. Lian ; Mingkai Hu ;
> Roy Zang ; linux-...@vger.kernel.org;
> devicet...@vger.kernel.org; linux-ker...@vger.kernel.org;
> linux-arm-ker...@lists.infradead.org; linuxppc-dev@lists.ozlabs.org; Z.q. Hou
> 
> Subject: Re: [PATCH 05/10] PCI: layerscape: Modify the way of getting
> capability with different PEX
> 
> On Fri, Aug 16, 2019 at 03:00:00AM +, Xiaowei Bao wrote:
> >
> >
> > > -Original Message-
> > > From: Andrew Murray 
> > > Sent: 2019年8月15日 20:51
> > > To: Xiaowei Bao 
> > > Cc: jingooh...@gmail.com; gustavo.pimen...@synopsys.com;
> > > bhelg...@google.com; robh...@kernel.org; mark.rutl...@arm.com;
> > > shawn...@kernel.org; Leo Li ; kis...@ti.com;
> > > lorenzo.pieral...@arm.com; a...@arndb.de;
> > > gre...@linuxfoundation.org; M.h. Lian ;
> > > Mingkai Hu ; Roy Zang ;
> > > linux-...@vger.kernel.org; devicet...@vger.kernel.org;
> > > linux-ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
> > > linuxppc-dev@lists.ozlabs.org
> > > Subject: Re: [PATCH 05/10] PCI: layerscape: Modify the way of
> > > getting capability with different PEX
> > >
> > > On Thu, Aug 15, 2019 at 04:37:11PM +0800, Xiaowei Bao wrote:
> > > > The different PCIe controller in one board may be have different
> > > > capability of MSI or MSIX, so change the way of getting the MSI
> > > > capability, make it more flexible.
> > > >
> > > > Signed-off-by: Xiaowei Bao 
> > > > ---
> > > >  drivers/pci/controller/dwc/pci-layerscape-ep.c | 28
> > > > +++---
> > > >  1 file changed, 21 insertions(+), 7 deletions(-)
> > > >
> > > > diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > > > b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > > > index be61d96..9404ca0 100644
> > > > --- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > > > +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > > > @@ -22,6 +22,7 @@
> > > >
> > > >  struct ls_pcie_ep {
> > > > struct dw_pcie  *pci;
> > > > +   struct pci_epc_features *ls_epc;
> > > >  };
> > > >
> > > >  #define to_ls_pcie_ep(x)   dev_get_drvdata((x)->dev)
> > > > @@ -40,25 +41,26 @@ static const struct of_device_id
> > > ls_pcie_ep_of_match[] = {
> > > > { },
> > > >  };
> > > >
> > > > -static const struct pci_epc_features ls_pcie_epc_features = {
> > > > -   .linkup_notifier = false,
> > > > -   .msi_capable = true,
> > > > -   .msix_capable = false,
> > > > -};
> > > > -
> > > >  static const struct pci_epc_features*
> > > > ls_pcie_ep_get_features(struct dw_pcie_ep *ep)  {
> > > > -   return _pcie_epc_features;
> > > > +   struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> > > > +   struct ls_pcie_ep *pcie = to_ls_pcie_ep(pci);
> > > > +
> > > > +   return pcie->ls_epc;
> > > >  }
> > > >
> > > >  static void ls_pcie_ep_init(struct dw_pcie_ep *ep)  {
> > > > struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> > > > +   struct ls_pcie_ep *pcie = to_ls_pcie_ep(pci);
> > > > enum pci_barno bar;
> > > >
> > > > for (bar = BAR_0; bar <= BAR_5; bar++)
> > > > dw_pcie_ep_reset_bar(pci, bar);
> > > > +
> > > > +   pcie->ls_epc->msi_capable = ep->msi_cap ? true : false;
> > > > +   pcie->ls_epc->msix_capable = ep->msix_cap ? true : false;
> > > >  }
> > > >
> > > >  static int ls_pcie_ep_raise_irq(struct dw_pcie_ep *ep, u8
> > > > func_no, @@
> > > > -118,6 +120,7 @@ static int __init ls_pcie_ep_probe(struct
> > > > platform_device
> > > *pdev)
> > > > struct device *dev = >dev;
> > > > struct dw_pcie *pci;
> > > > struct ls_pcie_ep *pcie;
> > > > +   struct pci_epc_features *ls_epc;
> > > > struct resource *dbi_base;
> > > > int ret;
> > > >
> > > > @@ -129,6 +132,10 @@ static int __init ls_pcie_ep_probe(struct
> > > platform_device *pdev)
> > > > if (!pci)
> > > > return -ENOMEM;
> > > >
> > > > +   ls_epc = devm_kzalloc(dev, sizeof(*ls_epc), GFP_KERNEL);
> > > > +   if (!ls_epc)
> > > > +   return -ENOMEM;
> > > > +
> > > > dbi_base = platform_get_resource_byname(pdev,
> IORESOURCE_MEM,
> > > "regs");
> > > > pci->dbi_base = devm_pci_remap_cfg_resource(dev, dbi_base);
> > > > if (IS_ERR(pci->dbi_base))
> > > > @@ -139,6 +146,13 @@ static int __init ls_pcie_ep_probe(struct
> > > platform_device *pdev)
> > > > pci->ops = _pcie_ep_ops;
> > > > pcie->pci = pci;
> > > >
> > > > +   ls_epc->linkup_notifier = false,
> > > > +   ls_epc->msi_capable = true,
> > > > +   ls_epc->msix_capable = true,
> > >
> > > As [msi,msix]_capable is shortly set from 

RE: [PATCH 02/10] PCI: designware-ep: Add the doorbell mode of MSI-X in EP mode

2019-08-16 Thread Xiaowei Bao


> -Original Message-
> From: Andrew Murray 
> Sent: 2019年8月16日 18:20
> To: Xiaowei Bao 
> Cc: jingooh...@gmail.com; gustavo.pimen...@synopsys.com;
> bhelg...@google.com; robh...@kernel.org; mark.rutl...@arm.com;
> shawn...@kernel.org; Leo Li ; kis...@ti.com;
> lorenzo.pieral...@arm.com; a...@arndb.de; gre...@linuxfoundation.org;
> M.h. Lian ; Mingkai Hu ;
> linux-...@vger.kernel.org; devicet...@vger.kernel.org;
> linux-ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
> linuxppc-dev@lists.ozlabs.org; Z.q. Hou 
> Subject: Re: [PATCH 02/10] PCI: designware-ep: Add the doorbell mode of
> MSI-X in EP mode
> 
> On Fri, Aug 16, 2019 at 02:58:31AM +, Xiaowei Bao wrote:
> >
> >
> > > -Original Message-
> > > From: Andrew Murray 
> > > Sent: 2019年8月15日 19:54
> > > To: Xiaowei Bao 
> > > Cc: jingooh...@gmail.com; gustavo.pimen...@synopsys.com;
> > > bhelg...@google.com; robh...@kernel.org; mark.rutl...@arm.com;
> > > shawn...@kernel.org; Leo Li ; kis...@ti.com;
> > > lorenzo.pieral...@arm.com; a...@arndb.de;
> > > gre...@linuxfoundation.org; M.h. Lian ;
> > > Mingkai Hu ; Roy Zang ;
> > > linux-...@vger.kernel.org; devicet...@vger.kernel.org;
> > > linux-ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
> > > linuxppc-dev@lists.ozlabs.org
> > > Subject: Re: [PATCH 02/10] PCI: designware-ep: Add the doorbell mode
> > > of MSI-X in EP mode
> > >
> > > On Thu, Aug 15, 2019 at 04:37:08PM +0800, Xiaowei Bao wrote:
> > > > Add the doorbell mode of MSI-X in EP mode.
> > > >
> > > > Signed-off-by: Xiaowei Bao 
> > > > ---
> > > >  drivers/pci/controller/dwc/pcie-designware-ep.c | 14
> ++
> > > >  drivers/pci/controller/dwc/pcie-designware.h| 14
> ++
> > > >  2 files changed, 28 insertions(+)
> > > >
> > > > diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c
> > > > b/drivers/pci/controller/dwc/pcie-designware-ep.c
> > > > index 75e2955..e3a7cdf 100644
> > > > --- a/drivers/pci/controller/dwc/pcie-designware-ep.c
> > > > +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
> > > > @@ -454,6 +454,20 @@ int dw_pcie_ep_raise_msi_irq(struct
> > > > dw_pcie_ep
> > > *ep, u8 func_no,
> > > > return 0;
> > > >  }
> > > >
> > > > +int dw_pcie_ep_raise_msix_irq_doorbell(struct dw_pcie_ep *ep, u8
> > > func_no,
> > > > +  u16 interrupt_num)
> > > > +{
> > > > +   struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> > > > +   u32 msg_data;
> > > > +
> > > > +   msg_data = (func_no << PCIE_MSIX_DOORBELL_PF_SHIFT) |
> > > > +  (interrupt_num - 1);
> > > > +
> > > > +   dw_pcie_writel_dbi(pci, PCIE_MSIX_DOORBELL, msg_data);
> > > > +
> > > > +   return 0;
> > > > +}
> > > > +
> > > >  int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
> > > >   u16 interrupt_num)
> > >
> > > Have I understood correctly that the hardware provides an
> > > alternative mechanism that allows for raising MSI-X interrupts
> > > without the bother of reading the capabilities registers?
> > Yes, the hardware provide two way to MSI-X, please check the page 492
> > of
> > DWC_pcie_dm_registers_4.30 Menu.
> > MSIX_DOORBELL_OFF on page 492 0x948 Description: MSI-X Doorbell
> > Register>
> 
> Thanks for the reference.
> 
> > >
> > > If so is there any good reason to keep dw_pcie_ep_raise_msix_irq?
> > > (And thus use it in dw_plat_pcie_ep_raise_irq also)?
> > I am not sure, but I think the dw_pcie_ep_raise_msix_irq function is
> > not correct, because I think we can't get the MSIX table from the
> > address ep->phys_base + tbl_addr, but I also don't know where I can get the
> correct MSIX table.
> 
> Well it looks like this function is used by snps,dw-pcie-ep and snps,dw-pcie,
> perhaps the doorbell mode isn't available on that hardware.
> 
> > >
> > >
> > > >  {
> > > > diff --git a/drivers/pci/controller/dwc/pcie-designware.h
> > > > b/drivers/pci/controller/dwc/pcie-designware.h
> > > > index 2b291e8..cd903e9 100644
> > > > --- a/drivers/pci/controller/dwc/pcie-designware.h
> > > > +++ b/drivers/pci/controller/dwc/pcie-designware.h
> > > > @@ -88,6 +88,11 @@
> > > >  #define PCIE_MISC_CONTROL_1_OFF0x8BC
> > > >  #define PCIE_DBI_RO_WR_EN  BIT(0)
> > > >
> > > > +#define PCIE_MSIX_DOORBELL 0x948
> > > > +#define PCIE_MSIX_DOORBELL_PF_SHIFT24
> > > > +#define PCIE_MSIX_DOORBELL_VF_SHIFT16
> > > > +#define PCIE_MSIX_DOORBELL_VF_ACTIVE   BIT(15)
> > >
> > > The _VF defines are not used, I'd suggest removing them.
> > In fact, I will add the SRIOV support in this file, the SRIOV feature
> > have verified In my board, but I need wait the EP framework SRIOV
> > patch merge, so I defined these two macros.
> 
> I'd suggest adding the VF macros along with the SRIOV feature.
OK, I will remove these two macros. Thanks.
> 
> Thanks,
> 
> Andrew Murray
> 
> > >
> > > Thanks,
> > >
> > > Andrew Murray
> > >
> > > > +
> > > >  /*

RE: [PATCH 01/10] PCI: designware-ep: Add multiple PFs support for DWC

2019-08-16 Thread Xiaowei Bao


> -Original Message-
> From: Andrew Murray 
> Sent: 2019年8月16日 17:45
> To: Xiaowei Bao 
> Cc: jingooh...@gmail.com; gustavo.pimen...@synopsys.com;
> mark.rutl...@arm.com; shawn...@kernel.org; Leo Li
> ; kis...@ti.com; lorenzo.pieral...@arm.com;
> a...@arndb.de; gre...@linuxfoundation.org; M.h. Lian
> ; Roy Zang ;
> linux-...@vger.kernel.org; devicet...@vger.kernel.org;
> linux-ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
> linuxppc-dev@lists.ozlabs.org; Z.q. Hou 
> Subject: Re: [PATCH 01/10] PCI: designware-ep: Add multiple PFs support for
> DWC
> 
> On Fri, Aug 16, 2019 at 02:55:41AM +, Xiaowei Bao wrote:
> >
> >
> > > -Original Message-
> > > From: Andrew Murray 
> > > Sent: 2019年8月15日 19:32
> > > To: Xiaowei Bao 
> > > Cc: jingooh...@gmail.com; gustavo.pimen...@synopsys.com;
> > > bhelg...@google.com; robh...@kernel.org; mark.rutl...@arm.com;
> > > shawn...@kernel.org; Leo Li ; kis...@ti.com;
> > > lorenzo.pieral...@arm.com; a...@arndb.de;
> > > gre...@linuxfoundation.org; M.h. Lian ;
> > > Mingkai Hu ; Roy Zang ;
> > > linux-...@vger.kernel.org; devicet...@vger.kernel.org;
> > > linux-ker...@vger.kernel.org; linux-arm-ker...@lists.infradead.org;
> > > linuxppc-dev@lists.ozlabs.org
> > > Subject: Re: [PATCH 01/10] PCI: designware-ep: Add multiple PFs
> > > support for DWC
> > >
> > > On Thu, Aug 15, 2019 at 04:37:07PM +0800, Xiaowei Bao wrote:
> > > > Add multiple PFs support for DWC, different PF have different
> > > > config space, we use pf-offset property which get from the DTS to
> > > > access the different pF config space.
> > >
> > > Thanks for the patch. I haven't seen a cover letter for this series,
> > > is there one missing?
> > Maybe I miss, I will add you to review next time, thanks a lot for your
> comments.
> > >
> > >
> > > >
> > > > Signed-off-by: Xiaowei Bao 
> > > > ---
> > > >  drivers/pci/controller/dwc/pcie-designware-ep.c |  97
> > > +-
> > > >  drivers/pci/controller/dwc/pcie-designware.c| 105
> > > ++--
> > > >  drivers/pci/controller/dwc/pcie-designware.h|  10 ++-
> > > >  include/linux/pci-epc.h |   1 +
> > > >  4 files changed, 164 insertions(+), 49 deletions(-)
> > > >
> > > > diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c
> > > > b/drivers/pci/controller/dwc/pcie-designware-ep.c
> > > > index 2bf5a35..75e2955 100644
> > > > --- a/drivers/pci/controller/dwc/pcie-designware-ep.c
> > > > +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
> > > > @@ -19,12 +19,14 @@ void dw_pcie_ep_linkup(struct dw_pcie_ep
> *ep)
> > > > pci_epc_linkup(epc);
> > > >  }
> > > >
> > > > -static void __dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum
> > > > pci_barno
> > > bar,
> > > > -  int flags)
> > > > +static void __dw_pcie_ep_reset_bar(struct dw_pcie *pci, u8 func_no,
> > > > +  enum pci_barno bar, int flags)
> > > >  {
> > > > u32 reg;
> > > > +   struct pci_epc *epc = pci->ep.epc;
> > > > +   u32 pf_base = func_no * epc->pf_offset;
> > > >
> > > > -   reg = PCI_BASE_ADDRESS_0 + (4 * bar);
> > > > +   reg = pf_base + PCI_BASE_ADDRESS_0 + (4 * bar);
> > >
> > > I think I'd rather see this arithmetic (and the one for determining
> > > pf_base) inside a macro or inline header function. This would make
> > > this code more readable and reduce the chances of an error by avoiding
> duplication of code.
> > >
> > > For example look at cdns_pcie_ep_fn_writeb and
> > > ROCKCHIP_PCIE_EP_FUNC_BASE for examples of other EP drivers that do
> > > this.
> > Agree, this looks fine, thanks a lot for your comments, I will use
> > this way to access the registers in next version patch.
> > >
> > >
> > > > dw_pcie_dbi_ro_wr_en(pci);
> > > > dw_pcie_writel_dbi2(pci, reg, 0x0);
> > > > dw_pcie_writel_dbi(pci, reg, 0x0); @@ -37,7 +39,12 @@ static
> > > > void __dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno
> > > > bar,
> > > >
> > > >  void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno bar)
> {
> > > > -   __dw_pcie_ep_reset_bar(pci, bar, 0);
> > > > +   u8 func_no, funcs;
> > > > +
> > > > +   funcs = pci->ep.epc->max_functions;
> > > > +
> > > > +   for (func_no = 0; func_no < funcs; func_no++)
> > > > +   __dw_pcie_ep_reset_bar(pci, func_no, bar, 0);
> > > >  }
> > > >
> > > >  static u8 __dw_pcie_ep_find_next_cap(struct dw_pcie *pci, u8
> > > > cap_ptr, @@ -78,28 +85,29 @@ static int
> > > > dw_pcie_ep_write_header(struct pci_epc *epc, u8 func_no,  {
> > > > struct dw_pcie_ep *ep = epc_get_drvdata(epc);
> > > > struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> > > > +   u32 pf_base = func_no * epc->pf_offset;
> > > >
> > > > dw_pcie_dbi_ro_wr_en(pci);
> > > > -   dw_pcie_writew_dbi(pci, PCI_VENDOR_ID, hdr->vendorid);
> > > > -   dw_pcie_writew_dbi(pci, PCI_DEVICE_ID, 

Re: [PATCH 02/10] PCI: designware-ep: Add the doorbell mode of MSI-X in EP mode

2019-08-16 Thread Kishon Vijay Abraham I
Hi,

On 16/08/19 8:28 AM, Xiaowei Bao wrote:
> 
> 
>> -Original Message-
>> From: Andrew Murray 
>> Sent: 2019年8月15日 19:54
>> To: Xiaowei Bao 
>> Cc: jingooh...@gmail.com; gustavo.pimen...@synopsys.com;
>> bhelg...@google.com; robh...@kernel.org; mark.rutl...@arm.com;
>> shawn...@kernel.org; Leo Li ; kis...@ti.com;
>> lorenzo.pieral...@arm.com; a...@arndb.de; gre...@linuxfoundation.org;
>> M.h. Lian ; Mingkai Hu ;
>> Roy Zang ; linux-...@vger.kernel.org;
>> devicet...@vger.kernel.org; linux-ker...@vger.kernel.org;
>> linux-arm-ker...@lists.infradead.org; linuxppc-dev@lists.ozlabs.org
>> Subject: Re: [PATCH 02/10] PCI: designware-ep: Add the doorbell mode of
>> MSI-X in EP mode
>>
>> On Thu, Aug 15, 2019 at 04:37:08PM +0800, Xiaowei Bao wrote:
>>> Add the doorbell mode of MSI-X in EP mode.
>>>
>>> Signed-off-by: Xiaowei Bao 
>>> ---
>>>  drivers/pci/controller/dwc/pcie-designware-ep.c | 14 ++
>>>  drivers/pci/controller/dwc/pcie-designware.h| 14 ++
>>>  2 files changed, 28 insertions(+)
>>>
>>> diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c
>>> b/drivers/pci/controller/dwc/pcie-designware-ep.c
>>> index 75e2955..e3a7cdf 100644
>>> --- a/drivers/pci/controller/dwc/pcie-designware-ep.c
>>> +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
>>> @@ -454,6 +454,20 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep
>> *ep, u8 func_no,
>>> return 0;
>>>  }
>>>
>>> +int dw_pcie_ep_raise_msix_irq_doorbell(struct dw_pcie_ep *ep, u8
>> func_no,
>>> +  u16 interrupt_num)
>>> +{
>>> +   struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
>>> +   u32 msg_data;
>>> +
>>> +   msg_data = (func_no << PCIE_MSIX_DOORBELL_PF_SHIFT) |
>>> +  (interrupt_num - 1);
>>> +
>>> +   dw_pcie_writel_dbi(pci, PCIE_MSIX_DOORBELL, msg_data);
>>> +
>>> +   return 0;
>>> +}
>>> +
>>>  int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
>>>   u16 interrupt_num)
>>
>> Have I understood correctly that the hardware provides an alternative
>> mechanism that allows for raising MSI-X interrupts without the bother of
>> reading the capabilities registers?
> Yes, the hardware provide two way to MSI-X, please check the page 492 of 
> DWC_pcie_dm_registers_4.30 Menu.
> MSIX_DOORBELL_OFF on page 492 0x948 Description: MSI-X Doorbell Register>
>>
>> If so is there any good reason to keep dw_pcie_ep_raise_msix_irq? (And thus
>> use it in dw_plat_pcie_ep_raise_irq also)?
> I am not sure, but I think the dw_pcie_ep_raise_msix_irq function is not 
> correct, 
> because I think we can't get the MSIX table from the address ep->phys_base + 
> tbl_addr, 
> but I also don't know where I can get the correct MSIX table.

Sometime back when I tried raising MSI-X from EP, it was failing. It's quite
possible dw_pcie_ep_raise_msix_irq function is not correct.

MSI-X table can be obtained from the inbound ATU corresponding to the MSIX bar.
IMO MSI-X support in EP mode needs rework. For instance set_msix should also
take BAR number as input to be configured in the MSI-X capability. The function
driver (pci-epf-test.c) should allocate memory taking into account the MSI-X 
table.

Thanks
Kishon


Re: [PATCH 05/10] PCI: layerscape: Modify the way of getting capability with different PEX

2019-08-16 Thread Andrew Murray
On Fri, Aug 16, 2019 at 03:00:00AM +, Xiaowei Bao wrote:
> 
> 
> > -Original Message-
> > From: Andrew Murray 
> > Sent: 2019年8月15日 20:51
> > To: Xiaowei Bao 
> > Cc: jingooh...@gmail.com; gustavo.pimen...@synopsys.com;
> > bhelg...@google.com; robh...@kernel.org; mark.rutl...@arm.com;
> > shawn...@kernel.org; Leo Li ; kis...@ti.com;
> > lorenzo.pieral...@arm.com; a...@arndb.de; gre...@linuxfoundation.org;
> > M.h. Lian ; Mingkai Hu ;
> > Roy Zang ; linux-...@vger.kernel.org;
> > devicet...@vger.kernel.org; linux-ker...@vger.kernel.org;
> > linux-arm-ker...@lists.infradead.org; linuxppc-dev@lists.ozlabs.org
> > Subject: Re: [PATCH 05/10] PCI: layerscape: Modify the way of getting
> > capability with different PEX
> > 
> > On Thu, Aug 15, 2019 at 04:37:11PM +0800, Xiaowei Bao wrote:
> > > The different PCIe controller in one board may be have different
> > > capability of MSI or MSIX, so change the way of getting the MSI
> > > capability, make it more flexible.
> > >
> > > Signed-off-by: Xiaowei Bao 
> > > ---
> > >  drivers/pci/controller/dwc/pci-layerscape-ep.c | 28
> > > +++---
> > >  1 file changed, 21 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > > b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > > index be61d96..9404ca0 100644
> > > --- a/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > > +++ b/drivers/pci/controller/dwc/pci-layerscape-ep.c
> > > @@ -22,6 +22,7 @@
> > >
> > >  struct ls_pcie_ep {
> > >   struct dw_pcie  *pci;
> > > + struct pci_epc_features *ls_epc;
> > >  };
> > >
> > >  #define to_ls_pcie_ep(x) dev_get_drvdata((x)->dev)
> > > @@ -40,25 +41,26 @@ static const struct of_device_id
> > ls_pcie_ep_of_match[] = {
> > >   { },
> > >  };
> > >
> > > -static const struct pci_epc_features ls_pcie_epc_features = {
> > > - .linkup_notifier = false,
> > > - .msi_capable = true,
> > > - .msix_capable = false,
> > > -};
> > > -
> > >  static const struct pci_epc_features*  ls_pcie_ep_get_features(struct
> > > dw_pcie_ep *ep)  {
> > > - return _pcie_epc_features;
> > > + struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> > > + struct ls_pcie_ep *pcie = to_ls_pcie_ep(pci);
> > > +
> > > + return pcie->ls_epc;
> > >  }
> > >
> > >  static void ls_pcie_ep_init(struct dw_pcie_ep *ep)  {
> > >   struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> > > + struct ls_pcie_ep *pcie = to_ls_pcie_ep(pci);
> > >   enum pci_barno bar;
> > >
> > >   for (bar = BAR_0; bar <= BAR_5; bar++)
> > >   dw_pcie_ep_reset_bar(pci, bar);
> > > +
> > > + pcie->ls_epc->msi_capable = ep->msi_cap ? true : false;
> > > + pcie->ls_epc->msix_capable = ep->msix_cap ? true : false;
> > >  }
> > >
> > >  static int ls_pcie_ep_raise_irq(struct dw_pcie_ep *ep, u8 func_no, @@
> > > -118,6 +120,7 @@ static int __init ls_pcie_ep_probe(struct platform_device
> > *pdev)
> > >   struct device *dev = >dev;
> > >   struct dw_pcie *pci;
> > >   struct ls_pcie_ep *pcie;
> > > + struct pci_epc_features *ls_epc;
> > >   struct resource *dbi_base;
> > >   int ret;
> > >
> > > @@ -129,6 +132,10 @@ static int __init ls_pcie_ep_probe(struct
> > platform_device *pdev)
> > >   if (!pci)
> > >   return -ENOMEM;
> > >
> > > + ls_epc = devm_kzalloc(dev, sizeof(*ls_epc), GFP_KERNEL);
> > > + if (!ls_epc)
> > > + return -ENOMEM;
> > > +
> > >   dbi_base = platform_get_resource_byname(pdev, IORESOURCE_MEM,
> > "regs");
> > >   pci->dbi_base = devm_pci_remap_cfg_resource(dev, dbi_base);
> > >   if (IS_ERR(pci->dbi_base))
> > > @@ -139,6 +146,13 @@ static int __init ls_pcie_ep_probe(struct
> > platform_device *pdev)
> > >   pci->ops = _pcie_ep_ops;
> > >   pcie->pci = pci;
> > >
> > > + ls_epc->linkup_notifier = false,
> > > + ls_epc->msi_capable = true,
> > > + ls_epc->msix_capable = true,
> > 
> > As [msi,msix]_capable is shortly set from ls_pcie_ep_init - is there any 
> > reason
> > to set them here (to potentially incorrect values)?
> This is a INIT value, maybe false is better for msi_capable and msix_capable, 
> of course, we don't need to set it.

ls_epc is kzalloc'd and so all zeros, so you get false for free. I think you
can remove these two lines (or all three if you don't care that linkup_notifier
isn't explicitly set).

Thanks,

Andrew Murray

> > 
> > Thanks,
> > 
> > Andrew Murray
> > 
> > > + ls_epc->bar_fixed_64bit = (1 << BAR_2) | (1 << BAR_4),
> > > +
> > > + pcie->ls_epc = ls_epc;
> > > +
> > >   platform_set_drvdata(pdev, pcie);
> > >
> > >   ret = ls_add_pcie_ep(pcie, pdev);
> > > --
> > > 2.9.5
> > >


Re: [PATCH 02/10] PCI: designware-ep: Add the doorbell mode of MSI-X in EP mode

2019-08-16 Thread Andrew Murray
On Fri, Aug 16, 2019 at 02:58:31AM +, Xiaowei Bao wrote:
> 
> 
> > -Original Message-
> > From: Andrew Murray 
> > Sent: 2019年8月15日 19:54
> > To: Xiaowei Bao 
> > Cc: jingooh...@gmail.com; gustavo.pimen...@synopsys.com;
> > bhelg...@google.com; robh...@kernel.org; mark.rutl...@arm.com;
> > shawn...@kernel.org; Leo Li ; kis...@ti.com;
> > lorenzo.pieral...@arm.com; a...@arndb.de; gre...@linuxfoundation.org;
> > M.h. Lian ; Mingkai Hu ;
> > Roy Zang ; linux-...@vger.kernel.org;
> > devicet...@vger.kernel.org; linux-ker...@vger.kernel.org;
> > linux-arm-ker...@lists.infradead.org; linuxppc-dev@lists.ozlabs.org
> > Subject: Re: [PATCH 02/10] PCI: designware-ep: Add the doorbell mode of
> > MSI-X in EP mode
> > 
> > On Thu, Aug 15, 2019 at 04:37:08PM +0800, Xiaowei Bao wrote:
> > > Add the doorbell mode of MSI-X in EP mode.
> > >
> > > Signed-off-by: Xiaowei Bao 
> > > ---
> > >  drivers/pci/controller/dwc/pcie-designware-ep.c | 14 ++
> > >  drivers/pci/controller/dwc/pcie-designware.h| 14 ++
> > >  2 files changed, 28 insertions(+)
> > >
> > > diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c
> > > b/drivers/pci/controller/dwc/pcie-designware-ep.c
> > > index 75e2955..e3a7cdf 100644
> > > --- a/drivers/pci/controller/dwc/pcie-designware-ep.c
> > > +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
> > > @@ -454,6 +454,20 @@ int dw_pcie_ep_raise_msi_irq(struct dw_pcie_ep
> > *ep, u8 func_no,
> > >   return 0;
> > >  }
> > >
> > > +int dw_pcie_ep_raise_msix_irq_doorbell(struct dw_pcie_ep *ep, u8
> > func_no,
> > > +u16 interrupt_num)
> > > +{
> > > + struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> > > + u32 msg_data;
> > > +
> > > + msg_data = (func_no << PCIE_MSIX_DOORBELL_PF_SHIFT) |
> > > +(interrupt_num - 1);
> > > +
> > > + dw_pcie_writel_dbi(pci, PCIE_MSIX_DOORBELL, msg_data);
> > > +
> > > + return 0;
> > > +}
> > > +
> > >  int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
> > > u16 interrupt_num)
> > 
> > Have I understood correctly that the hardware provides an alternative
> > mechanism that allows for raising MSI-X interrupts without the bother of
> > reading the capabilities registers?
> Yes, the hardware provide two way to MSI-X, please check the page 492 of 
> DWC_pcie_dm_registers_4.30 Menu.
> MSIX_DOORBELL_OFF on page 492 0x948 Description: MSI-X Doorbell Register>

Thanks for the reference.

> > 
> > If so is there any good reason to keep dw_pcie_ep_raise_msix_irq? (And thus
> > use it in dw_plat_pcie_ep_raise_irq also)?
> I am not sure, but I think the dw_pcie_ep_raise_msix_irq function is not 
> correct, 
> because I think we can't get the MSIX table from the address ep->phys_base + 
> tbl_addr, 
> but I also don't know where I can get the correct MSIX table.

Well it looks like this function is used by snps,dw-pcie-ep and snps,dw-pcie,
perhaps the doorbell mode isn't available on that hardware.

> > 
> > 
> > >  {
> > > diff --git a/drivers/pci/controller/dwc/pcie-designware.h
> > > b/drivers/pci/controller/dwc/pcie-designware.h
> > > index 2b291e8..cd903e9 100644
> > > --- a/drivers/pci/controller/dwc/pcie-designware.h
> > > +++ b/drivers/pci/controller/dwc/pcie-designware.h
> > > @@ -88,6 +88,11 @@
> > >  #define PCIE_MISC_CONTROL_1_OFF  0x8BC
> > >  #define PCIE_DBI_RO_WR_ENBIT(0)
> > >
> > > +#define PCIE_MSIX_DOORBELL   0x948
> > > +#define PCIE_MSIX_DOORBELL_PF_SHIFT  24
> > > +#define PCIE_MSIX_DOORBELL_VF_SHIFT  16
> > > +#define PCIE_MSIX_DOORBELL_VF_ACTIVE BIT(15)
> > 
> > The _VF defines are not used, I'd suggest removing them.
> In fact, I will add the SRIOV support in this file, the SRIOV feature have 
> verified 
> In my board, but I need wait the EP framework SRIOV patch merge, 
> so I defined these two macros.

I'd suggest adding the VF macros along with the SRIOV feature.

Thanks,

Andrew Murray

> > 
> > Thanks,
> > 
> > Andrew Murray
> > 
> > > +
> > >  /*
> > >   * iATU Unroll-specific register definitions
> > >   * From 4.80 core version the address translation will be made by
> > > unroll @@ -399,6 +404,8 @@ int dw_pcie_ep_raise_msi_irq(struct
> > dw_pcie_ep *ep, u8 func_no,
> > >u8 interrupt_num);
> > >  int dw_pcie_ep_raise_msix_irq(struct dw_pcie_ep *ep, u8 func_no,
> > >u16 interrupt_num);
> > > +int dw_pcie_ep_raise_msix_irq_doorbell(struct dw_pcie_ep *ep, u8
> > func_no,
> > > +u16 interrupt_num);
> > >  void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno bar);
> > > #else  static inline void dw_pcie_ep_linkup(struct dw_pcie_ep *ep) @@
> > > -431,6 +438,13 @@ static inline int dw_pcie_ep_raise_msix_irq(struct
> > dw_pcie_ep *ep, u8 func_no,
> > >   return 0;
> > >  }
> > >
> > > +static inline int dw_pcie_ep_raise_msix_irq_doorbell(struct dw_pcie_ep
> > *ep,
> > > +  

Re: [PATCH 01/10] PCI: designware-ep: Add multiple PFs support for DWC

2019-08-16 Thread Andrew Murray
On Fri, Aug 16, 2019 at 02:55:41AM +, Xiaowei Bao wrote:
> 
> 
> > -Original Message-
> > From: Andrew Murray 
> > Sent: 2019年8月15日 19:32
> > To: Xiaowei Bao 
> > Cc: jingooh...@gmail.com; gustavo.pimen...@synopsys.com;
> > bhelg...@google.com; robh...@kernel.org; mark.rutl...@arm.com;
> > shawn...@kernel.org; Leo Li ; kis...@ti.com;
> > lorenzo.pieral...@arm.com; a...@arndb.de; gre...@linuxfoundation.org;
> > M.h. Lian ; Mingkai Hu ;
> > Roy Zang ; linux-...@vger.kernel.org;
> > devicet...@vger.kernel.org; linux-ker...@vger.kernel.org;
> > linux-arm-ker...@lists.infradead.org; linuxppc-dev@lists.ozlabs.org
> > Subject: Re: [PATCH 01/10] PCI: designware-ep: Add multiple PFs support for
> > DWC
> > 
> > On Thu, Aug 15, 2019 at 04:37:07PM +0800, Xiaowei Bao wrote:
> > > Add multiple PFs support for DWC, different PF have different config
> > > space, we use pf-offset property which get from the DTS to access the
> > > different pF config space.
> > 
> > Thanks for the patch. I haven't seen a cover letter for this series, is 
> > there one
> > missing?
> Maybe I miss, I will add you to review next time, thanks a lot for your 
> comments.
> > 
> > 
> > >
> > > Signed-off-by: Xiaowei Bao 
> > > ---
> > >  drivers/pci/controller/dwc/pcie-designware-ep.c |  97
> > +-
> > >  drivers/pci/controller/dwc/pcie-designware.c| 105
> > ++--
> > >  drivers/pci/controller/dwc/pcie-designware.h|  10 ++-
> > >  include/linux/pci-epc.h |   1 +
> > >  4 files changed, 164 insertions(+), 49 deletions(-)
> > >
> > > diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c
> > > b/drivers/pci/controller/dwc/pcie-designware-ep.c
> > > index 2bf5a35..75e2955 100644
> > > --- a/drivers/pci/controller/dwc/pcie-designware-ep.c
> > > +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c
> > > @@ -19,12 +19,14 @@ void dw_pcie_ep_linkup(struct dw_pcie_ep *ep)
> > >   pci_epc_linkup(epc);
> > >  }
> > >
> > > -static void __dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno
> > bar,
> > > -int flags)
> > > +static void __dw_pcie_ep_reset_bar(struct dw_pcie *pci, u8 func_no,
> > > +enum pci_barno bar, int flags)
> > >  {
> > >   u32 reg;
> > > + struct pci_epc *epc = pci->ep.epc;
> > > + u32 pf_base = func_no * epc->pf_offset;
> > >
> > > - reg = PCI_BASE_ADDRESS_0 + (4 * bar);
> > > + reg = pf_base + PCI_BASE_ADDRESS_0 + (4 * bar);
> > 
> > I think I'd rather see this arithmetic (and the one for determining pf_base)
> > inside a macro or inline header function. This would make this code more
> > readable and reduce the chances of an error by avoiding duplication of code.
> > 
> > For example look at cdns_pcie_ep_fn_writeb and
> > ROCKCHIP_PCIE_EP_FUNC_BASE for examples of other EP drivers that do
> > this.
> Agree, this looks fine, thanks a lot for your comments, I will use this way 
> to access
> the registers in next version patch.
> > 
> > 
> > >   dw_pcie_dbi_ro_wr_en(pci);
> > >   dw_pcie_writel_dbi2(pci, reg, 0x0);
> > >   dw_pcie_writel_dbi(pci, reg, 0x0);
> > > @@ -37,7 +39,12 @@ static void __dw_pcie_ep_reset_bar(struct dw_pcie
> > > *pci, enum pci_barno bar,
> > >
> > >  void dw_pcie_ep_reset_bar(struct dw_pcie *pci, enum pci_barno bar)  {
> > > - __dw_pcie_ep_reset_bar(pci, bar, 0);
> > > + u8 func_no, funcs;
> > > +
> > > + funcs = pci->ep.epc->max_functions;
> > > +
> > > + for (func_no = 0; func_no < funcs; func_no++)
> > > + __dw_pcie_ep_reset_bar(pci, func_no, bar, 0);
> > >  }
> > >
> > >  static u8 __dw_pcie_ep_find_next_cap(struct dw_pcie *pci, u8 cap_ptr,
> > > @@ -78,28 +85,29 @@ static int dw_pcie_ep_write_header(struct pci_epc
> > > *epc, u8 func_no,  {
> > >   struct dw_pcie_ep *ep = epc_get_drvdata(epc);
> > >   struct dw_pcie *pci = to_dw_pcie_from_ep(ep);
> > > + u32 pf_base = func_no * epc->pf_offset;
> > >
> > >   dw_pcie_dbi_ro_wr_en(pci);
> > > - dw_pcie_writew_dbi(pci, PCI_VENDOR_ID, hdr->vendorid);
> > > - dw_pcie_writew_dbi(pci, PCI_DEVICE_ID, hdr->deviceid);
> > > - dw_pcie_writeb_dbi(pci, PCI_REVISION_ID, hdr->revid);
> > > - dw_pcie_writeb_dbi(pci, PCI_CLASS_PROG, hdr->progif_code);
> > > - dw_pcie_writew_dbi(pci, PCI_CLASS_DEVICE,
> > > + dw_pcie_writew_dbi(pci, pf_base + PCI_VENDOR_ID, hdr->vendorid);
> > > + dw_pcie_writew_dbi(pci, pf_base + PCI_DEVICE_ID, hdr->deviceid);
> > > + dw_pcie_writeb_dbi(pci, pf_base + PCI_REVISION_ID, hdr->revid);
> > > + dw_pcie_writeb_dbi(pci, pf_base + PCI_CLASS_PROG, hdr->progif_code);
> > > + dw_pcie_writew_dbi(pci, pf_base + PCI_CLASS_DEVICE,
> > >  hdr->subclass_code | hdr->baseclass_code << 8);
> > > - dw_pcie_writeb_dbi(pci, PCI_CACHE_LINE_SIZE,
> > > + dw_pcie_writeb_dbi(pci, pf_base + PCI_CACHE_LINE_SIZE,
> > >  hdr->cache_line_size);
> > > - dw_pcie_writew_dbi(pci, PCI_SUBSYSTEM_VENDOR_ID,
> > > + dw_pcie_writew_dbi(pci, pf_base + PCI_SUBSYSTEM_VENDOR_ID,
> > > 

[Bug 204371] BUG kmalloc-4k (Tainted: G W ): Object padding overwritten

2019-08-16 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=204371

--- Comment #32 from Christophe Leroy (christophe.le...@c-s.fr) ---
I think first thing is to fix test_add_free_space_entry() :
- replace the map = kzalloc(...) by map = (void *)get_zeroed_page(...) like in
other places.
- replace the kfree(map); by free_page((unsigned long)map);

Then see if the WARNING on kfree() in  btrfs_free_dummy_fs_info() is still
there.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

Re: [PATCH v4 3/3] x86/kasan: support KASAN_VMALLOC

2019-08-16 Thread Christophe Leroy




Le 15/08/2019 à 02:16, Daniel Axtens a écrit :

In the case where KASAN directly allocates memory to back vmalloc
space, don't map the early shadow page over it.


If early shadow page is not mapped, any bad memory access will Oops on 
the shadow access instead of Oopsing on the real bad access.


You should still map early shadow page, and replace it with real page 
when needed.


Christophe



We prepopulate pgds/p4ds for the range that would otherwise be empty.
This is required to get it synced to hardware on boot, allowing the
lower levels of the page tables to be filled dynamically.

Acked-by: Dmitry Vyukov 
Signed-off-by: Daniel Axtens 

---

v2: move from faulting in shadow pgds to prepopulating
---
  arch/x86/Kconfig|  1 +
  arch/x86/mm/kasan_init_64.c | 61 +
  2 files changed, 62 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 222855cc0158..40562cc3771f 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -134,6 +134,7 @@ config X86
select HAVE_ARCH_JUMP_LABEL
select HAVE_ARCH_JUMP_LABEL_RELATIVE
select HAVE_ARCH_KASAN  if X86_64
+   select HAVE_ARCH_KASAN_VMALLOC  if X86_64
select HAVE_ARCH_KGDB
select HAVE_ARCH_MMAP_RND_BITS  if MMU
select HAVE_ARCH_MMAP_RND_COMPAT_BITS   if MMU && COMPAT
diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index 296da58f3013..2f57c4ddff61 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -245,6 +245,52 @@ static void __init kasan_map_early_shadow(pgd_t *pgd)
} while (pgd++, addr = next, addr != end);
  }
  
+static void __init kasan_shallow_populate_p4ds(pgd_t *pgd,

+   unsigned long addr,
+   unsigned long end,
+   int nid)
+{
+   p4d_t *p4d;
+   unsigned long next;
+   void *p;
+
+   p4d = p4d_offset(pgd, addr);
+   do {
+   next = p4d_addr_end(addr, end);
+
+   if (p4d_none(*p4d)) {
+   p = early_alloc(PAGE_SIZE, nid, true);
+   p4d_populate(_mm, p4d, p);
+   }
+   } while (p4d++, addr = next, addr != end);
+}
+
+static void __init kasan_shallow_populate_pgds(void *start, void *end)
+{
+   unsigned long addr, next;
+   pgd_t *pgd;
+   void *p;
+   int nid = early_pfn_to_nid((unsigned long)start);
+
+   addr = (unsigned long)start;
+   pgd = pgd_offset_k(addr);
+   do {
+   next = pgd_addr_end(addr, (unsigned long)end);
+
+   if (pgd_none(*pgd)) {
+   p = early_alloc(PAGE_SIZE, nid, true);
+   pgd_populate(_mm, pgd, p);
+   }
+
+   /*
+* we need to populate p4ds to be synced when running in
+* four level mode - see sync_global_pgds_l4()
+*/
+   kasan_shallow_populate_p4ds(pgd, addr, next, nid);
+   } while (pgd++, addr = next, addr != (unsigned long)end);
+}
+
+
  #ifdef CONFIG_KASAN_INLINE
  static int kasan_die_handler(struct notifier_block *self,
 unsigned long val,
@@ -352,9 +398,24 @@ void __init kasan_init(void)
shadow_cpu_entry_end = (void *)round_up(
(unsigned long)shadow_cpu_entry_end, PAGE_SIZE);
  
+	/*

+* If we're in full vmalloc mode, don't back vmalloc space with early
+* shadow pages. Instead, prepopulate pgds/p4ds so they are synced to
+* the global table and we can populate the lower levels on demand.
+*/
+#ifdef CONFIG_KASAN_VMALLOC
+   kasan_shallow_populate_pgds(
+   kasan_mem_to_shadow((void *)PAGE_OFFSET + MAXMEM),
+   kasan_mem_to_shadow((void *)VMALLOC_END));
+
+   kasan_populate_early_shadow(
+   kasan_mem_to_shadow((void *)VMALLOC_END + 1),
+   shadow_cpu_entry_begin);
+#else
kasan_populate_early_shadow(
kasan_mem_to_shadow((void *)PAGE_OFFSET + MAXMEM),
shadow_cpu_entry_begin);
+#endif
  
  	kasan_populate_shadow((unsigned long)shadow_cpu_entry_begin,

  (unsigned long)shadow_cpu_entry_end, 0);



[PATCH] powerpc/32: Add warning on misaligned copy_page() or clear_page()

2019-08-16 Thread Christophe Leroy
copy_page() and clear_page() expect page aligned destination, and
use dcbz instruction to clear entire cache lines based on the
assumption that the destination is cache aligned.

As shown during analysis of a bug in BTRFS filesystem, a misaligned
copy_page() can create bugs that are difficult to locate (see Link).

Add an explicit WARNING when copy_page() or clear_page() are called
with misaligned destination.

Signed-off-by: Christophe Leroy 
Cc: Erhard F. 
Link: https://bugzilla.kernel.org/show_bug.cgi?id=204371
---
 arch/powerpc/include/asm/page_32.h | 4 
 arch/powerpc/kernel/misc_32.S  | 5 +
 2 files changed, 9 insertions(+)

diff --git a/arch/powerpc/include/asm/page_32.h 
b/arch/powerpc/include/asm/page_32.h
index 683dfbc67ca8..d64dfe3ac712 100644
--- a/arch/powerpc/include/asm/page_32.h
+++ b/arch/powerpc/include/asm/page_32.h
@@ -40,6 +40,8 @@ typedef unsigned long long pte_basic_t;
 typedef unsigned long pte_basic_t;
 #endif
 
+#include 
+
 /*
  * Clear page using the dcbz instruction, which doesn't cause any
  * memory traffic (except to write out any cache lines which get
@@ -49,6 +51,8 @@ static inline void clear_page(void *addr)
 {
unsigned int i;
 
+   WARN_ON((unsigned long)addr & (L1_CACHE_BYTES - 1));
+
for (i = 0; i < PAGE_SIZE / L1_CACHE_BYTES; i++, addr += L1_CACHE_BYTES)
dcbz(addr);
 }
diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index fe4bd321730e..02d90e1ebf65 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -452,7 +452,12 @@ END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE)
stwur9,16(r3)
 
 _GLOBAL(copy_page)
+   rlwinm  r5, r3, 0, L1_CACHE_BYTES - 1
addir3,r3,-4
+
+0: twnei   r5, 0   /* WARN if r3 is not cache aligned */
+   EMIT_BUG_ENTRY 0b,__FILE__,__LINE__, BUGFLAG_WARNING
+
addir4,r4,-4
 
li  r5,4
-- 
2.13.3



Re: [PATCH v4 1/3] kasan: support backing vmalloc space with real shadow memory

2019-08-16 Thread Christophe Leroy




Le 15/08/2019 à 02:16, Daniel Axtens a écrit :

Hook into vmalloc and vmap, and dynamically allocate real shadow
memory to back the mappings.

Most mappings in vmalloc space are small, requiring less than a full
page of shadow space. Allocating a full shadow page per mapping would
therefore be wasteful. Furthermore, to ensure that different mappings
use different shadow pages, mappings would have to be aligned to
KASAN_SHADOW_SCALE_SIZE * PAGE_SIZE.

Instead, share backing space across multiple mappings. Allocate
a backing page the first time a mapping in vmalloc space uses a
particular page of the shadow region. Keep this page around
regardless of whether the mapping is later freed - in the mean time
the page could have become shared by another vmalloc mapping.

This can in theory lead to unbounded memory growth, but the vmalloc
allocator is pretty good at reusing addresses, so the practical memory
usage grows at first but then stays fairly stable.


I guess people having gigabytes of memory don't mind, but I'm concerned 
about tiny targets with very little amount of memory. I have boards with 
as little as 32Mbytes of RAM. The shadow region for the linear space 
already takes one eighth of the RAM. I'd rather avoid keeping unused 
shadow pages busy.


Each page of shadow memory represent 8 pages of real memory. Could we 
use page_ref to count how many pieces of a shadow page are used so that 
we can free it when the ref count decreases to 0.




This requires architecture support to actually use: arches must stop
mapping the read-only zero page over portion of the shadow region that
covers the vmalloc space and instead leave it unmapped.


Why 'must' ? Couldn't we switch back and forth from the zero page to 
real page on demand ?


If the zero page is not mapped for unused vmalloc space, bad memory 
accesses will Oops on the shadow memory access instead of Oopsing on the 
real bad access, making it more difficult to locate and identify the issue.




This allows KASAN with VMAP_STACK, and will be needed for architectures
that do not have a separate module space (e.g. powerpc64, which I am
currently working on). It also allows relaxing the module alignment
back to PAGE_SIZE.


Why 'needed' ? powerpc32 doesn't have a separate module space and 
doesn't need that.




Link: https://bugzilla.kernel.org/show_bug.cgi?id=202009
Acked-by: Vasily Gorbik 
Signed-off-by: Daniel Axtens 
[Mark: rework shadow allocation]
Signed-off-by: Mark Rutland 

--

v2: let kasan_unpoison_shadow deal with ranges that do not use a
 full shadow byte.

v3: relax module alignment
 rename to kasan_populate_vmalloc which is a much better name
 deal with concurrency correctly

v4: Integrate Mark's rework
 Poision pages on vfree
 Handle allocation failures. I've tested this by inserting artificial
  failures and using test_vmalloc to stress it. I haven't handled the
  per-cpu case: it looked like it would require a messy hacking-up of
  the function to deal with an OOM failure case in a debug feature.

---
  Documentation/dev-tools/kasan.rst | 60 +++
  include/linux/kasan.h | 24 +++
  include/linux/moduleloader.h  |  2 +-
  include/linux/vmalloc.h   | 12 ++
  lib/Kconfig.kasan | 16 
  lib/test_kasan.c  | 26 
  mm/kasan/common.c | 67 +++
  mm/kasan/generic_report.c |  3 ++
  mm/kasan/kasan.h  |  1 +
  mm/vmalloc.c  | 28 -
  10 files changed, 237 insertions(+), 2 deletions(-)

diff --git a/Documentation/dev-tools/kasan.rst 
b/Documentation/dev-tools/kasan.rst
index b72d07d70239..35fda484a672 100644
--- a/Documentation/dev-tools/kasan.rst
+++ b/Documentation/dev-tools/kasan.rst
@@ -215,3 +215,63 @@ brk handler is used to print bug reports.
  A potential expansion of this mode is a hardware tag-based mode, which would
  use hardware memory tagging support instead of compiler instrumentation and
  manual shadow memory manipulation.
+
+What memory accesses are sanitised by KASAN?
+
+
+The kernel maps memory in a number of different parts of the address
+space. This poses something of a problem for KASAN, which requires
+that all addresses accessed by instrumented code have a valid shadow
+region.
+
+The range of kernel virtual addresses is large: there is not enough
+real memory to support a real shadow region for every address that
+could be accessed by the kernel.
+
+By default
+~~
+
+By default, architectures only map real memory over the shadow region
+for the linear mapping (and potentially other small areas). For all
+other areas - such as vmalloc and vmemmap space - a single read-only
+page is mapped over the shadow area. This read-only shadow page
+declares all memory accesses as permitted.
+
+This presents a problem for modules: they do not 

Re: [PATCH 4/6] dma-mapping: remove arch_dma_mmap_pgprot

2019-08-16 Thread Geert Uytterhoeven
On Fri, Aug 16, 2019 at 9:19 AM Christoph Hellwig  wrote:
> arch_dma_mmap_pgprot is used for two things:
>
>  1) to override the "normal" uncached page attributes for mapping
> memory coherent to devices that can't snoop the CPU caches
>  2) to provide the special DMA_ATTR_WRITE_COMBINE semantics on older
> arm systems
>
> Replace one with the pgprot_dmacoherent macro that is already provided
> by arm and much simpler to use, and lift the DMA_ATTR_WRITE_COMBINE
> handling to common code with an explicit arch opt-in.
>
> Signed-off-by: Christoph Hellwig 

>  arch/m68k/Kconfig  |  1 -
>  arch/m68k/include/asm/pgtable_mm.h |  3 +++
>  arch/m68k/kernel/dma.c |  3 +--

Acked-by: Geert Uytterhoeven 

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


Re: [PATCH 6/6] driver core: initialize a default DMA mask for platform device

2019-08-16 Thread Geert Uytterhoeven
Hi Christoph,

On Fri, Aug 16, 2019 at 8:30 AM Christoph Hellwig  wrote:
> We still treat devices without a DMA mask as defaulting to 32-bits for
> both mask, but a few releases ago we've started warning about such
> cases, as they require special cases to work around this sloppyness.
> Add a dma_mask field to struct platform_device so that we can initialize
> the dma_mask pointer in struct device and initialize both masks to
> 32-bits by default, replacing similar functionality in m68k and
> powerpc.  The arch_setup_pdev_archdata hooks is now unused and removed.
>
> Note that the code looks a little odd with the various conditionals
> because we have to support platform_device structures that are
> statically allocated.
>
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/m68k/kernel/dma.c   |  9 ---

Acked-by: Geert Uytterhoeven 

>  arch/sh/boards/mach-ecovec24/setup.c |  2 --
>  arch/sh/boards/mach-migor/setup.c|  1 -

Acked-by: Geert Uytterhoeven 
given "[PATCH 0/2] Remove calls to empty arch_setup_pdev_archdata()"
https://lore.kernel.org/linux-renesas-soc/1526641611-2769-1-git-send-email-geert+rene...@glider.be/

Gr{oetje,eeting}s,

Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds


[PATCH 6/6] arm64: document the choice of page attributes for pgprot_dmacoherent

2019-08-16 Thread Christoph Hellwig
Based on an email from Will Deacon.

Signed-off-by: Christoph Hellwig 
---
 arch/arm64/include/asm/pgtable.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 6700371227d1..6ff221d9a631 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -435,6 +435,14 @@ static inline pmd_t pmd_mkdevmap(pmd_t pmd)
__pgprot_modify(prot, PTE_ATTRINDX_MASK, PTE_ATTRINDX(MT_NORMAL_NC) | 
PTE_PXN | PTE_UXN)
 #define pgprot_device(prot) \
__pgprot_modify(prot, PTE_ATTRINDX_MASK, PTE_ATTRINDX(MT_DEVICE_nGnRE) 
| PTE_PXN | PTE_UXN)
+/*
+ * DMA allocations for non-coherent devices use what the Arm architecture calls
+ * "Normal non-cacheable" memory, which permits speculation, unaligned accesses
+ * and merging of writes.  This is different from "Strongly Ordered" memory
+ * which is intended for MMIO and thus forbids speculation, preserves access
+ * size, requires strict alignment and also forces write responses to come from
+ * the endpoint.
+ */
 #define pgprot_dmacoherent(prot) \
__pgprot_modify(prot, PTE_ATTRINDX_MASK, \
PTE_ATTRINDX(MT_NORMAL_NC) | PTE_PXN | PTE_UXN)
-- 
2.20.1



[PATCH 5/6] dma-mapping: make dma_atomic_pool_init self-contained

2019-08-16 Thread Christoph Hellwig
The memory allocated for the atomic pool needs to have the same
mapping attributes that we use for remapping, so use
pgprot_dmacoherent instead of open coding it.  Also deduct a
suitable zone to allocate the memory from based on the presence
of the DMA zones.

Signed-off-by: Christoph Hellwig 
---
 arch/arc/mm/dma.c   |  6 --
 arch/arm64/mm/dma-mapping.c |  6 --
 arch/csky/mm/dma-mapping.c  |  6 --
 arch/nds32/kernel/dma.c |  6 --
 include/linux/dma-mapping.h |  1 -
 kernel/dma/remap.c  | 17 ++---
 6 files changed, 14 insertions(+), 28 deletions(-)

diff --git a/arch/arc/mm/dma.c b/arch/arc/mm/dma.c
index 62c210e7ee4c..ff4a5752f8cc 100644
--- a/arch/arc/mm/dma.c
+++ b/arch/arc/mm/dma.c
@@ -104,9 +104,3 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, 
u64 size,
dev_info(dev, "use %sncoherent DMA ops\n",
 dev->dma_coherent ? "" : "non");
 }
-
-static int __init atomic_pool_init(void)
-{
-   return dma_atomic_pool_init(GFP_KERNEL, pgprot_noncached(PAGE_KERNEL));
-}
-postcore_initcall(atomic_pool_init);
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 676efcda51e6..a1d05f669f67 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -28,12 +28,6 @@ void arch_dma_prep_coherent(struct page *page, size_t size)
__dma_flush_area(page_address(page), size);
 }
 
-static int __init arm64_dma_init(void)
-{
-   return dma_atomic_pool_init(GFP_DMA32, __pgprot(PROT_NORMAL_NC));
-}
-arch_initcall(arm64_dma_init);
-
 #ifdef CONFIG_IOMMU_DMA
 void arch_teardown_dma_ops(struct device *dev)
 {
diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c
index 80783bb71c5c..602a60d47a94 100644
--- a/arch/csky/mm/dma-mapping.c
+++ b/arch/csky/mm/dma-mapping.c
@@ -14,12 +14,6 @@
 #include 
 #include 
 
-static int __init atomic_pool_init(void)
-{
-   return dma_atomic_pool_init(GFP_KERNEL, pgprot_noncached(PAGE_KERNEL));
-}
-postcore_initcall(atomic_pool_init);
-
 void arch_dma_prep_coherent(struct page *page, size_t size)
 {
if (PageHighMem(page)) {
diff --git a/arch/nds32/kernel/dma.c b/arch/nds32/kernel/dma.c
index 490e3720d694..4206d4b6c8ce 100644
--- a/arch/nds32/kernel/dma.c
+++ b/arch/nds32/kernel/dma.c
@@ -80,9 +80,3 @@ void arch_dma_prep_coherent(struct page *page, size_t size)
 {
cache_op(page_to_phys(page), size, cpu_dma_wbinval_range);
 }
-
-static int __init atomic_pool_init(void)
-{
-   return dma_atomic_pool_init(GFP_KERNEL, pgprot_noncached(PAGE_KERNEL));
-}
-postcore_initcall(atomic_pool_init);
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index f7d1eea32c78..48ebe8295987 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -624,7 +624,6 @@ void *dma_common_pages_remap(struct page **pages, size_t 
size,
const void *caller);
 void dma_common_free_remap(void *cpu_addr, size_t size, unsigned long 
vm_flags);
 
-int __init dma_atomic_pool_init(gfp_t gfp, pgprot_t prot);
 bool dma_in_atomic_pool(void *start, size_t size);
 void *dma_alloc_from_pool(size_t size, struct page **ret_page, gfp_t flags);
 bool dma_free_from_pool(void *start, size_t size);
diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c
index ffe78f0b2fe4..838123f79639 100644
--- a/kernel/dma/remap.c
+++ b/kernel/dma/remap.c
@@ -105,7 +105,16 @@ static int __init early_coherent_pool(char *p)
 }
 early_param("coherent_pool", early_coherent_pool);
 
-int __init dma_atomic_pool_init(gfp_t gfp, pgprot_t prot)
+static gfp_t dma_atomic_pool_gfp(void)
+{
+   if (IS_ENABLED(CONFIG_ZONE_DMA))
+   return GFP_DMA;
+   if (IS_ENABLED(CONFIG_ZONE_DMA32))
+   return GFP_DMA32;
+   return GFP_KERNEL;
+}
+
+static int __init dma_atomic_pool_init(void)
 {
unsigned int pool_size_order = get_order(atomic_pool_size);
unsigned long nr_pages = atomic_pool_size >> PAGE_SHIFT;
@@ -117,7 +126,7 @@ int __init dma_atomic_pool_init(gfp_t gfp, pgprot_t prot)
page = dma_alloc_from_contiguous(NULL, nr_pages,
 pool_size_order, false);
else
-   page = alloc_pages(gfp, pool_size_order);
+   page = alloc_pages(dma_atomic_pool_gfp(), pool_size_order);
if (!page)
goto out;
 
@@ -128,7 +137,8 @@ int __init dma_atomic_pool_init(gfp_t gfp, pgprot_t prot)
goto free_page;
 
addr = dma_common_contiguous_remap(page, atomic_pool_size, VM_USERMAP,
-  prot, __builtin_return_address(0));
+  pgprot_dmacoherent(PAGE_KERNEL),
+  __builtin_return_address(0));
if (!addr)
goto destroy_genpool;
 
@@ -155,6 +165,7 @@ int __init dma_atomic_pool_init(gfp_t gfp, pgprot_t prot)
atomic_pool_size / 1024);
   

[PATCH 4/6] dma-mapping: remove arch_dma_mmap_pgprot

2019-08-16 Thread Christoph Hellwig
arch_dma_mmap_pgprot is used for two things:

 1) to override the "normal" uncached page attributes for mapping
memory coherent to devices that can't snoop the CPU caches
 2) to provide the special DMA_ATTR_WRITE_COMBINE semantics on older
arm systems

Replace one with the pgprot_dmacoherent macro that is already provided
by arm and much simpler to use, and lift the DMA_ATTR_WRITE_COMBINE
handling to common code with an explicit arch opt-in.

Signed-off-by: Christoph Hellwig 
---
 arch/arm/Kconfig   |  1 +
 arch/arm/mm/Kconfig|  1 -
 arch/arm/mm/dma-mapping.c  |  6 --
 arch/arm64/Kconfig |  1 -
 arch/arm64/include/asm/pgtable.h   |  4 
 arch/arm64/mm/dma-mapping.c|  6 --
 arch/m68k/Kconfig  |  1 -
 arch/m68k/include/asm/pgtable_mm.h |  3 +++
 arch/m68k/kernel/dma.c |  3 +--
 include/linux/dma-noncoherent.h| 13 +++--
 kernel/dma/Kconfig | 14 +++---
 kernel/dma/mapping.c   |  8 +---
 12 files changed, 36 insertions(+), 25 deletions(-)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 33b00579beff..e172fba1e8fd 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -7,6 +7,7 @@ config ARM
select ARCH_HAS_BINFMT_FLAT
select ARCH_HAS_DEBUG_VIRTUAL if MMU
select ARCH_HAS_DEVMEM_IS_ALLOWED
+   select ARCH_HAS_DMA_WRITE_COMBINE if !ARM_DMA_MEM_BUFFERABLE
select ARCH_HAS_ELF_RANDOMIZE
select ARCH_HAS_FORTIFY_SOURCE
select ARCH_HAS_KEEPINITRD
diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index c54cd7ed90ba..0609c9e2191b 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -665,7 +665,6 @@ config ARM_LPAE
select PHYS_ADDR_T_64BIT
select SWIOTLB
select ARCH_HAS_DMA_COHERENT_TO_PFN
-   select ARCH_HAS_DMA_MMAP_PGPROT
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
select ARCH_HAS_SYNC_DMA_FOR_CPU
help
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c
index d42557ee69c2..d27b12f61737 100644
--- a/arch/arm/mm/dma-mapping.c
+++ b/arch/arm/mm/dma-mapping.c
@@ -2402,12 +2402,6 @@ long arch_dma_coherent_to_pfn(struct device *dev, void 
*cpu_addr,
return dma_to_pfn(dev, dma_addr);
 }
 
-pgprot_t arch_dma_mmap_pgprot(struct device *dev, pgprot_t prot,
-   unsigned long attrs)
-{
-   return __get_dma_pgprot(attrs, prot);
-}
-
 void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
gfp_t gfp, unsigned long attrs)
 {
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 3adcec05b1f6..dab9dda34206 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -13,7 +13,6 @@ config ARM64
select ARCH_HAS_DEBUG_VIRTUAL
select ARCH_HAS_DEVMEM_IS_ALLOWED
select ARCH_HAS_DMA_COHERENT_TO_PFN
-   select ARCH_HAS_DMA_MMAP_PGPROT
select ARCH_HAS_DMA_PREP_COHERENT
select ARCH_HAS_ACPI_TABLE_UPGRADE if ACPI
select ARCH_HAS_ELF_RANDOMIZE
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index e09760ece844..6700371227d1 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -435,6 +435,10 @@ static inline pmd_t pmd_mkdevmap(pmd_t pmd)
__pgprot_modify(prot, PTE_ATTRINDX_MASK, PTE_ATTRINDX(MT_NORMAL_NC) | 
PTE_PXN | PTE_UXN)
 #define pgprot_device(prot) \
__pgprot_modify(prot, PTE_ATTRINDX_MASK, PTE_ATTRINDX(MT_DEVICE_nGnRE) 
| PTE_PXN | PTE_UXN)
+#define pgprot_dmacoherent(prot) \
+   __pgprot_modify(prot, PTE_ATTRINDX_MASK, \
+   PTE_ATTRINDX(MT_NORMAL_NC) | PTE_PXN | PTE_UXN)
+
 #define __HAVE_PHYS_MEM_ACCESS_PROT
 struct file;
 extern pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index bd2b039f43a6..676efcda51e6 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -11,12 +11,6 @@
 
 #include 
 
-pgprot_t arch_dma_mmap_pgprot(struct device *dev, pgprot_t prot,
-   unsigned long attrs)
-{
-   return pgprot_writecombine(prot);
-}
-
 void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
size_t size, enum dma_data_direction dir)
 {
diff --git a/arch/m68k/Kconfig b/arch/m68k/Kconfig
index c518d695c376..a9e564306d3e 100644
--- a/arch/m68k/Kconfig
+++ b/arch/m68k/Kconfig
@@ -4,7 +4,6 @@ config M68K
default y
select ARCH_32BIT_OFF_T
select ARCH_HAS_BINFMT_FLAT
-   select ARCH_HAS_DMA_MMAP_PGPROT if MMU && !COLDFIRE
select ARCH_HAS_DMA_PREP_COHERENT if HAS_DMA && MMU && !COLDFIRE
select ARCH_HAS_SYNC_DMA_FOR_DEVICE if HAS_DMA
select ARCH_MIGHT_HAVE_PC_PARPORT if ISA
diff --git a/arch/m68k/include/asm/pgtable_mm.h 
b/arch/m68k/include/asm/pgtable_mm.h
index fe3ddd73a0cc..fde4534b974f 100644
--- a/arch/m68k/include/asm/pgtable_mm.h

[PATCH 3/6] arm-nommu: remove the unused pgprot_dmacoherent define

2019-08-16 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 arch/arm/include/asm/pgtable-nommu.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/arm/include/asm/pgtable-nommu.h 
b/arch/arm/include/asm/pgtable-nommu.h
index 0b1f6799a32e..d0de24f06724 100644
--- a/arch/arm/include/asm/pgtable-nommu.h
+++ b/arch/arm/include/asm/pgtable-nommu.h
@@ -62,7 +62,6 @@ typedef pte_t *pte_addr_t;
  */
 #define pgprot_noncached(prot) (prot)
 #define pgprot_writecombine(prot) (prot)
-#define pgprot_dmacoherent(prot) (prot)
 #define pgprot_device(prot)(prot)
 
 
-- 
2.20.1



[PATCH 2/6] unicore32: remove the unused pgprot_dmacoherent define

2019-08-16 Thread Christoph Hellwig
Signed-off-by: Christoph Hellwig 
---
 arch/unicore32/include/asm/pgtable.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/unicore32/include/asm/pgtable.h 
b/arch/unicore32/include/asm/pgtable.h
index 9492aa304f03..126e961a8cb0 100644
--- a/arch/unicore32/include/asm/pgtable.h
+++ b/arch/unicore32/include/asm/pgtable.h
@@ -198,8 +198,6 @@ static inline pte_t pte_mkspecial(pte_t pte) { return pte; }
__pgprot(pgprot_val(prot) & ~PTE_CACHEABLE)
 #define pgprot_writecombine(prot)  \
__pgprot(pgprot_val(prot) & ~PTE_CACHEABLE)
-#define pgprot_dmacoherent(prot)   \
-   __pgprot(pgprot_val(prot) & ~PTE_CACHEABLE)
 
 #define pmd_none(pmd)  (!pmd_val(pmd))
 #define pmd_present(pmd)   (pmd_val(pmd) & PMD_PRESENT)
-- 
2.20.1



Re: [PATCH] powerpc: Allow flush_(inval_)dcache_range to work across ranges >4GB

2019-08-16 Thread Greg Kroah-Hartman
On Fri, Aug 16, 2019 at 11:42:22AM +1000, Michael Ellerman wrote:
> Greg Kroah-Hartman  writes:
> > On Thu, Aug 15, 2019 at 02:55:42PM +1000, Alastair D'Silva wrote:
> >> From: Alastair D'Silva 
> >> 
> >> Heads Up: This patch cannot be submitted to Linus's tree, as the affected
> >> assembler functions have already been converted to C.
> 
> That was done in upstream commit:
> 
> 22e9c88d486a ("powerpc/64: reuse PPC32 static inline flush_dcache_range()")
> 
> Which is a larger change that we don't want to backport. This patch is a
> minimal fix for stable trees.
> 
> 
> >> When calling flush_(inval_)dcache_range with a size >4GB, we were masking
> >> off the upper 32 bits, so we would incorrectly flush a range smaller
> >> than intended.
> >> 
> >> This patch replaces the 32 bit shifts with 64 bit ones, so that
> >> the full size is accounted for.
> >> 
> >> Signed-off-by: Alastair D'Silva 
> >> ---
> >>  arch/powerpc/kernel/misc_64.S | 4 ++--
> >>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> Acked-by: Michael Ellerman 
> 
> > 
> >
> > This is not the correct way to submit patches for inclusion in the
> > stable kernel tree.  Please read:
> > https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
> > for how to do this properly.
> >
> > 
> 
> Hi Greg,
> 
> This is "option 3", submit the patch directly, and the patch "deviates
> from the original upstream patch" because the upstream patch was a
> wholesale conversion from asm to C.
> 
> This patch applies cleanly to v4.14 and v4.19.
> 
> The change log should have mentioned which upstream patch it is not a
> backport of, is there anything else we should have done differently to
> avoid the formletter bot :)

That is exactly what you should have done.  It needs to be VERY explicit
as to why this is being submitted different from what upstream did, and
to what trees it needs to go to and who is going to be responsible for
when it breaks.  And it will break :)

thanks,

greg k-h


[PATCH 1/6] MIPS: remove support for DMA_ATTR_WRITE_COMBINE

2019-08-16 Thread Christoph Hellwig
Mips uses the KSEG1 kernel memory segment do map dma coherent
allocations for non-coherent devices as uncachable, and does not have
any kind of special support for DMA_ATTR_WRITE_COMBINE in the allocation
path.  Thus supporting DMA_ATTR_WRITE_COMBINE in dma_mmap_attrs will
lead to multiple mappings with different caching attributes.

Fixes: 8c172467be36 ("MIPS: Add implementation of dma_map_ops.mmap()")
Signed-off-by: Christoph Hellwig 
---
 arch/mips/Kconfig  | 1 -
 arch/mips/mm/dma-noncoherent.c | 8 
 2 files changed, 9 deletions(-)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index d50fafd7bf3a..86e6760ef0d0 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -1119,7 +1119,6 @@ config DMA_PERDEV_COHERENT
 
 config DMA_NONCOHERENT
bool
-   select ARCH_HAS_DMA_MMAP_PGPROT
select ARCH_HAS_SYNC_DMA_FOR_DEVICE
select ARCH_HAS_UNCACHED_SEGMENT
select NEED_DMA_MAP_STATE
diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c
index ed56c6fa7be2..1d4d57dd9acf 100644
--- a/arch/mips/mm/dma-noncoherent.c
+++ b/arch/mips/mm/dma-noncoherent.c
@@ -65,14 +65,6 @@ long arch_dma_coherent_to_pfn(struct device *dev, void 
*cpu_addr,
return page_to_pfn(virt_to_page(cached_kernel_address(cpu_addr)));
 }
 
-pgprot_t arch_dma_mmap_pgprot(struct device *dev, pgprot_t prot,
-   unsigned long attrs)
-{
-   if (attrs & DMA_ATTR_WRITE_COMBINE)
-   return pgprot_writecombine(prot);
-   return pgprot_noncached(prot);
-}
-
 static inline void dma_sync_virt(void *addr, size_t size,
enum dma_data_direction dir)
 {
-- 
2.20.1



cleanup the dma_pgprot handling

2019-08-16 Thread Christoph Hellwig
Hi all,

this series replaced the arch_dma_mmap_pgprot hooks with the
simpler pgprot_dmacoherent as used by the arm code already and
cleans up various bits around that area.

I'd still like to hear a confirmation from the mips folks how
the write combibe attribute can or can't work with the KSEG1
uncached segment.


[PATCH 6/6] driver core: initialize a default DMA mask for platform device

2019-08-16 Thread Christoph Hellwig
We still treat devices without a DMA mask as defaulting to 32-bits for
both mask, but a few releases ago we've started warning about such
cases, as they require special cases to work around this sloppyness.
Add a dma_mask field to struct platform_device so that we can initialize
the dma_mask pointer in struct device and initialize both masks to
32-bits by default, replacing similar functionality in m68k and
powerpc.  The arch_setup_pdev_archdata hooks is now unused and removed.

Note that the code looks a little odd with the various conditionals
because we have to support platform_device structures that are
statically allocated.

Signed-off-by: Christoph Hellwig 
---
 arch/m68k/kernel/dma.c   |  9 ---
 arch/powerpc/kernel/setup-common.c   |  6 -
 arch/sh/boards/mach-ap325rxa/setup.c |  1 -
 arch/sh/boards/mach-ecovec24/setup.c |  2 --
 arch/sh/boards/mach-kfr2r09/setup.c  |  1 -
 arch/sh/boards/mach-migor/setup.c|  1 -
 arch/sh/boards/mach-se/7724/setup.c  |  2 --
 drivers/base/platform.c  | 37 
 include/linux/platform_device.h  |  2 +-
 9 files changed, 17 insertions(+), 44 deletions(-)

diff --git a/arch/m68k/kernel/dma.c b/arch/m68k/kernel/dma.c
index 30cd59caf037..447849d1d645 100644
--- a/arch/m68k/kernel/dma.c
+++ b/arch/m68k/kernel/dma.c
@@ -79,12 +79,3 @@ void arch_sync_dma_for_device(struct device *dev, 
phys_addr_t handle,
break;
}
 }
-
-void arch_setup_pdev_archdata(struct platform_device *pdev)
-{
-   if (pdev->dev.coherent_dma_mask == DMA_MASK_NONE &&
-   pdev->dev.dma_mask == NULL) {
-   pdev->dev.coherent_dma_mask = DMA_BIT_MASK(32);
-   pdev->dev.dma_mask = >dev.coherent_dma_mask;
-   }
-}
diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index 1f8db666468d..5e6543aba1b3 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -778,12 +778,6 @@ void ppc_printk_progress(char *s, unsigned short hex)
pr_info("%s\n", s);
 }
 
-void arch_setup_pdev_archdata(struct platform_device *pdev)
-{
-   pdev->archdata.dma_mask = DMA_BIT_MASK(32);
-   pdev->dev.dma_mask = >archdata.dma_mask;
-}
-
 static __init void print_system_info(void)
 {
pr_info("-\n");
diff --git a/arch/sh/boards/mach-ap325rxa/setup.c 
b/arch/sh/boards/mach-ap325rxa/setup.c
index 8301a4378f50..665cad452798 100644
--- a/arch/sh/boards/mach-ap325rxa/setup.c
+++ b/arch/sh/boards/mach-ap325rxa/setup.c
@@ -527,7 +527,6 @@ static int __init ap325rxa_devices_setup(void)
 
/* Initialize CEU platform device separately to map memory first */
device_initialize(_ceu_device.dev);
-   arch_setup_pdev_archdata(_ceu_device);
dma_declare_coherent_memory(_ceu_device.dev,
ceu_dma_membase, ceu_dma_membase,
ceu_dma_membase + CEU_BUFFER_MEMORY_SIZE - 1);
diff --git a/arch/sh/boards/mach-ecovec24/setup.c 
b/arch/sh/boards/mach-ecovec24/setup.c
index f402aa741bf3..acaa97459531 100644
--- a/arch/sh/boards/mach-ecovec24/setup.c
+++ b/arch/sh/boards/mach-ecovec24/setup.c
@@ -1440,7 +1440,6 @@ static int __init arch_setup(void)
 
/* Initialize CEU platform devices separately to map memory first */
device_initialize(_ceu_devices[0]->dev);
-   arch_setup_pdev_archdata(ecovec_ceu_devices[0]);
dma_declare_coherent_memory(_ceu_devices[0]->dev,
ceu0_dma_membase, ceu0_dma_membase,
ceu0_dma_membase +
@@ -1448,7 +1447,6 @@ static int __init arch_setup(void)
platform_device_add(ecovec_ceu_devices[0]);
 
device_initialize(_ceu_devices[1]->dev);
-   arch_setup_pdev_archdata(ecovec_ceu_devices[1]);
dma_declare_coherent_memory(_ceu_devices[1]->dev,
ceu1_dma_membase, ceu1_dma_membase,
ceu1_dma_membase +
diff --git a/arch/sh/boards/mach-kfr2r09/setup.c 
b/arch/sh/boards/mach-kfr2r09/setup.c
index 1cf9a47ac90e..96538ba3aa32 100644
--- a/arch/sh/boards/mach-kfr2r09/setup.c
+++ b/arch/sh/boards/mach-kfr2r09/setup.c
@@ -601,7 +601,6 @@ static int __init kfr2r09_devices_setup(void)
 
/* Initialize CEU platform device separately to map memory first */
device_initialize(_ceu_device.dev);
-   arch_setup_pdev_archdata(_ceu_device);
dma_declare_coherent_memory(_ceu_device.dev,
ceu_dma_membase, ceu_dma_membase,
ceu_dma_membase + CEU_BUFFER_MEMORY_SIZE - 1);
diff --git a/arch/sh/boards/mach-migor/setup.c 
b/arch/sh/boards/mach-migor/setup.c
index 90702740f207..9ed369dad62d 100644
--- a/arch/sh/boards/mach-migor/setup.c
+++ b/arch/sh/boards/mach-migor/setup.c
@@ -602,7 +602,6 @@ static int __init migor_devices_setup(void)
 
/* Initialize CEU platform device 

[PATCH 5/6] dma-mapping: remove is_device_dma_capable

2019-08-16 Thread Christoph Hellwig
No users left.

Signed-off-by: Christoph Hellwig 
---
 include/linux/dma-mapping.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index f7d1eea32c78..14702e2d6fa8 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -149,11 +149,6 @@ static inline int valid_dma_direction(int dma_direction)
(dma_direction == DMA_FROM_DEVICE));
 }
 
-static inline int is_device_dma_capable(struct device *dev)
-{
-   return dev->dma_mask != NULL && *dev->dma_mask != DMA_MASK_NONE;
-}
-
 #ifdef CONFIG_DMA_DECLARE_COHERENT
 /*
  * These three functions are only for dma allocator.
-- 
2.20.1



[PATCH 3/6] usb: add a HCD_DMA flag instead of guestimating DMA capabilities

2019-08-16 Thread Christoph Hellwig
The usb core is the only major place in the kernel that checks for
a non-NULL device dma_mask to see if a device is DMA capable.  This
is generally a bad idea, as all major busses always set up a DMA mask,
even if the device is not DMA capable - in fact bus layers like PCI
can't even know if a device is DMA capable at enumeration time.  This
leads to lots of workaround in HCD drivers, and also prevented us from
setting up a DMA mask for platform devices by default last time we
tried.

Replace this guess with an explicit HCD_DMA that is set by drivers that
appear to have DMA support.

Signed-off-by: Christoph Hellwig 
---
 drivers/staging/octeon-usb/octeon-hcd.c | 2 +-
 drivers/usb/core/hcd.c  | 1 -
 drivers/usb/dwc2/hcd.c  | 6 +++---
 drivers/usb/host/ehci-grlib.c   | 2 +-
 drivers/usb/host/ehci-hcd.c | 2 +-
 drivers/usb/host/ehci-pmcmsp.c  | 2 +-
 drivers/usb/host/ehci-ppc-of.c  | 2 +-
 drivers/usb/host/ehci-ps3.c | 2 +-
 drivers/usb/host/ehci-sh.c  | 2 +-
 drivers/usb/host/ehci-xilinx-of.c   | 2 +-
 drivers/usb/host/fhci-hcd.c | 2 +-
 drivers/usb/host/fotg210-hcd.c  | 2 +-
 drivers/usb/host/imx21-hcd.c| 2 +-
 drivers/usb/host/isp116x-hcd.c  | 6 --
 drivers/usb/host/isp1362-hcd.c  | 5 -
 drivers/usb/host/ohci-hcd.c | 2 +-
 drivers/usb/host/ohci-ppc-of.c  | 2 +-
 drivers/usb/host/ohci-ps3.c | 2 +-
 drivers/usb/host/ohci-sa.c  | 2 +-
 drivers/usb/host/ohci-sm501.c   | 2 +-
 drivers/usb/host/ohci-tmio.c| 2 +-
 drivers/usb/host/oxu210hp-hcd.c | 3 ---
 drivers/usb/host/r8a66597-hcd.c | 6 --
 drivers/usb/host/sl811-hcd.c| 6 --
 drivers/usb/host/u132-hcd.c | 2 --
 drivers/usb/host/uhci-grlib.c   | 2 +-
 drivers/usb/host/uhci-pci.c | 2 +-
 drivers/usb/host/uhci-platform.c| 2 +-
 drivers/usb/host/xhci.c | 2 +-
 drivers/usb/isp1760/isp1760-core.c  | 3 ---
 drivers/usb/isp1760/isp1760-if.c| 1 -
 drivers/usb/musb/musb_host.c| 2 +-
 drivers/usb/renesas_usbhs/mod_host.c| 2 +-
 include/linux/usb.h | 1 -
 include/linux/usb/hcd.h | 7 +--
 35 files changed, 31 insertions(+), 62 deletions(-)

diff --git a/drivers/staging/octeon-usb/octeon-hcd.c 
b/drivers/staging/octeon-usb/octeon-hcd.c
index cd2b777073c4..a5321cc692c5 100644
--- a/drivers/staging/octeon-usb/octeon-hcd.c
+++ b/drivers/staging/octeon-usb/octeon-hcd.c
@@ -3512,7 +3512,7 @@ static const struct hc_driver octeon_hc_driver = {
.product_desc   = "Octeon Host Controller",
.hcd_priv_size  = sizeof(struct octeon_hcd),
.irq= octeon_usb_irq,
-   .flags  = HCD_MEMORY | HCD_USB2,
+   .flags  = HCD_MEMORY | HCD_DMA | HCD_USB2,
.start  = octeon_usb_start,
.stop   = octeon_usb_stop,
.urb_enqueue= octeon_usb_urb_enqueue,
diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index 8592c0344fe8..add2af4af766 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -2454,7 +2454,6 @@ struct usb_hcd *__usb_create_hcd(const struct hc_driver 
*driver,
hcd->self.controller = dev;
hcd->self.sysdev = sysdev;
hcd->self.bus_name = bus_name;
-   hcd->self.uses_dma = (sysdev->dma_mask != NULL);
 
timer_setup(>rh_timer, rh_timer_func, 0);
 #ifdef CONFIG_PM
diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
index 111787a137ee..81afe553aa66 100644
--- a/drivers/usb/dwc2/hcd.c
+++ b/drivers/usb/dwc2/hcd.c
@@ -5062,13 +5062,13 @@ int dwc2_hcd_init(struct dwc2_hsotg *hsotg)
dwc2_hc_driver.reset_device = dwc2_reset_device;
}
 
+   if (hsotg->params.host_dma)
+   dwc2_hc_driver.flags |= HCD_DMA;
+
hcd = usb_create_hcd(_hc_driver, hsotg->dev, dev_name(hsotg->dev));
if (!hcd)
goto error1;
 
-   if (!hsotg->params.host_dma)
-   hcd->self.uses_dma = 0;
-
hcd->has_tt = 1;
 
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
diff --git a/drivers/usb/host/ehci-grlib.c b/drivers/usb/host/ehci-grlib.c
index 656b8c08efc8..a2c3b4ec8a8b 100644
--- a/drivers/usb/host/ehci-grlib.c
+++ b/drivers/usb/host/ehci-grlib.c
@@ -30,7 +30,7 @@ static const struct hc_driver ehci_grlib_hc_driver = {
 * generic hardware linkage
 */
.irq= ehci_irq,
-   .flags  = HCD_MEMORY | HCD_USB2 | HCD_BH,
+   .flags  = HCD_MEMORY | HCD_DMA | HCD_USB2 | HCD_BH,
 
/*
 * basic lifecycle operations
diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c
index 9da7e22848c9..cf2b7ae93b7e 100644
--- 

[PATCH 4/6] usb/max3421: remove the dummy {un, }map_urb_for_dma methods

2019-08-16 Thread Christoph Hellwig
Now that we have an explicit HCD_DMA flag, there is not need to override
these methods.

Signed-off-by: Christoph Hellwig 
---
 drivers/usb/host/max3421-hcd.c | 17 -
 1 file changed, 17 deletions(-)

diff --git a/drivers/usb/host/max3421-hcd.c b/drivers/usb/host/max3421-hcd.c
index afa321ab55fc..8819f502b6a6 100644
--- a/drivers/usb/host/max3421-hcd.c
+++ b/drivers/usb/host/max3421-hcd.c
@@ -1800,21 +1800,6 @@ max3421_bus_resume(struct usb_hcd *hcd)
return -1;
 }
 
-/*
- * The SPI driver already takes care of DMA-mapping/unmapping, so no
- * reason to do it twice.
- */
-static int
-max3421_map_urb_for_dma(struct usb_hcd *hcd, struct urb *urb, gfp_t mem_flags)
-{
-   return 0;
-}
-
-static void
-max3421_unmap_urb_for_dma(struct usb_hcd *hcd, struct urb *urb)
-{
-}
-
 static const struct hc_driver max3421_hcd_desc = {
.description =  "max3421",
.product_desc = DRIVER_DESC,
@@ -1826,8 +1811,6 @@ static const struct hc_driver max3421_hcd_desc = {
.get_frame_number = max3421_get_frame_number,
.urb_enqueue =  max3421_urb_enqueue,
.urb_dequeue =  max3421_urb_dequeue,
-   .map_urb_for_dma =  max3421_map_urb_for_dma,
-   .unmap_urb_for_dma =max3421_unmap_urb_for_dma,
.endpoint_disable = max3421_endpoint_disable,
.hub_status_data =  max3421_hub_status_data,
.hub_control =  max3421_hub_control,
-- 
2.20.1



[PATCH 2/6] usb: add a hcd_uses_dma helper

2019-08-16 Thread Christoph Hellwig
The USB buffer allocation code is the only place in the usb core (and in
fact the whole kernel) that uses is_device_dma_capable, while the URB
mapping code uses the uses_dma flag in struct usb_bus.  Switch the buffer
allocation to use the uses_dma flag used by the rest of the USB code,
and create a helper in hcd.h that checks this flag as well as the
CONFIG_HAS_DMA to simplify the caller a bit.

Signed-off-by: Christoph Hellwig 
---
 drivers/usb/core/buffer.c | 10 +++---
 drivers/usb/core/hcd.c|  4 ++--
 drivers/usb/dwc2/hcd.c|  2 +-
 include/linux/usb.h   |  2 +-
 include/linux/usb/hcd.h   |  3 +++
 5 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/usb/core/buffer.c b/drivers/usb/core/buffer.c
index 1a5b3dcae930..6cf22c27f2d2 100644
--- a/drivers/usb/core/buffer.c
+++ b/drivers/usb/core/buffer.c
@@ -66,9 +66,7 @@ int hcd_buffer_create(struct usb_hcd *hcd)
charname[16];
int i, size;
 
-   if (hcd->localmem_pool ||
-   !IS_ENABLED(CONFIG_HAS_DMA) ||
-   !is_device_dma_capable(hcd->self.sysdev))
+   if (hcd->localmem_pool || !hcd_uses_dma(hcd))
return 0;
 
for (i = 0; i < HCD_BUFFER_POOLS; i++) {
@@ -129,8 +127,7 @@ void *hcd_buffer_alloc(
return gen_pool_dma_alloc(hcd->localmem_pool, size, dma);
 
/* some USB hosts just use PIO */
-   if (!IS_ENABLED(CONFIG_HAS_DMA) ||
-   !is_device_dma_capable(bus->sysdev)) {
+   if (!hcd_uses_dma(hcd)) {
*dma = ~(dma_addr_t) 0;
return kmalloc(size, mem_flags);
}
@@ -160,8 +157,7 @@ void hcd_buffer_free(
return;
}
 
-   if (!IS_ENABLED(CONFIG_HAS_DMA) ||
-   !is_device_dma_capable(bus->sysdev)) {
+   if (!hcd_uses_dma(hcd)) {
kfree(addr);
return;
}
diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
index 2ccbc2f83570..8592c0344fe8 100644
--- a/drivers/usb/core/hcd.c
+++ b/drivers/usb/core/hcd.c
@@ -1412,7 +1412,7 @@ int usb_hcd_map_urb_for_dma(struct usb_hcd *hcd, struct 
urb *urb,
if (usb_endpoint_xfer_control(>ep->desc)) {
if (hcd->self.uses_pio_for_control)
return ret;
-   if (IS_ENABLED(CONFIG_HAS_DMA) && hcd->self.uses_dma) {
+   if (hcd_uses_dma(hcd)) {
if (is_vmalloc_addr(urb->setup_packet)) {
WARN_ONCE(1, "setup packet is not dma 
capable\n");
return -EAGAIN;
@@ -1446,7 +1446,7 @@ int usb_hcd_map_urb_for_dma(struct usb_hcd *hcd, struct 
urb *urb,
dir = usb_urb_dir_in(urb) ? DMA_FROM_DEVICE : DMA_TO_DEVICE;
if (urb->transfer_buffer_length != 0
&& !(urb->transfer_flags & URB_NO_TRANSFER_DMA_MAP)) {
-   if (IS_ENABLED(CONFIG_HAS_DMA) && hcd->self.uses_dma) {
+   if (hcd_uses_dma(hcd)) {
if (urb->num_sgs) {
int n;
 
diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c
index ee144ff8af5b..111787a137ee 100644
--- a/drivers/usb/dwc2/hcd.c
+++ b/drivers/usb/dwc2/hcd.c
@@ -4608,7 +4608,7 @@ static int _dwc2_hcd_urb_enqueue(struct usb_hcd *hcd, 
struct urb *urb,
 
buf = urb->transfer_buffer;
 
-   if (hcd->self.uses_dma) {
+   if (hcd_uses_dma(hcd)) {
if (!buf && (urb->transfer_dma & 3)) {
dev_err(hsotg->dev,
"%s: unaligned transfer with no 
transfer_buffer",
diff --git a/include/linux/usb.h b/include/linux/usb.h
index 83d35d993e8c..e87826e23d59 100644
--- a/include/linux/usb.h
+++ b/include/linux/usb.h
@@ -1457,7 +1457,7 @@ typedef void (*usb_complete_t)(struct urb *);
  * field rather than determining a dma address themselves.
  *
  * Note that transfer_buffer must still be set if the controller
- * does not support DMA (as indicated by bus.uses_dma) and when talking
+ * does not support DMA (as indicated by hcd_uses_dma()) and when talking
  * to root hub. If you have to trasfer between highmem zone and the device
  * on such controller, create a bounce buffer or bail out with an error.
  * If transfer_buffer cannot be set (is in highmem) and the controller is DMA
diff --git a/include/linux/usb/hcd.h b/include/linux/usb/hcd.h
index bab27ccc8ff5..a20e7815d814 100644
--- a/include/linux/usb/hcd.h
+++ b/include/linux/usb/hcd.h
@@ -422,6 +422,9 @@ static inline bool 
hcd_periodic_completion_in_progress(struct usb_hcd *hcd,
return hcd->high_prio_bh.completing_ep == ep;
 }
 
+#define hcd_uses_dma(hcd) \
+   (IS_ENABLED(CONFIG_HAS_DMA) && (hcd)->self.uses_dma)
+
 extern int usb_hcd_link_urb_to_ep(struct usb_hcd *hcd, struct urb *urb);
 extern int usb_hcd_check_unlink_urb(struct usb_hcd *hcd, struct urb *urb,
int status);
-- 
2.20.1



[PATCH 1/6] usb: don't create dma pools for HCDs with a localmem_pool

2019-08-16 Thread Christoph Hellwig
If the HCD provides a localmem pool we will never use the DMA pools, so
don't create them.

Fixes: b0310c2f09bb ("USB: use genalloc for USB HCs with local memory")
Signed-off-by: Christoph Hellwig 
---
 drivers/usb/core/buffer.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/core/buffer.c b/drivers/usb/core/buffer.c
index 1359b78a624e..1a5b3dcae930 100644
--- a/drivers/usb/core/buffer.c
+++ b/drivers/usb/core/buffer.c
@@ -66,9 +66,9 @@ int hcd_buffer_create(struct usb_hcd *hcd)
charname[16];
int i, size;
 
-   if (!IS_ENABLED(CONFIG_HAS_DMA) ||
-   (!is_device_dma_capable(hcd->self.sysdev) &&
-!hcd->localmem_pool))
+   if (hcd->localmem_pool ||
+   !IS_ENABLED(CONFIG_HAS_DMA) ||
+   !is_device_dma_capable(hcd->self.sysdev))
return 0;
 
for (i = 0; i < HCD_BUFFER_POOLS; i++) {
-- 
2.20.1



Re: 5.2.7 kernel doesn't boot on G5

2019-08-16 Thread Christian Marillat
On 15 août 2019 19:50, christophe leroy  wrote:

> Le 15/08/2019 à 19:48, Christian Marillat a écrit :
>> On 15 août 2019 19:29, christophe leroy  wrote:
>>
>>> Le 15/08/2019 à 19:05, Mathieu Malaterre a écrit :
 Does that ring a bell to anyone here ? Thanks
>>>
>>> Apparently that's 5.2.0, not 5.2.7
>>
>> Yes, 5.2.7 is the Debian package version. Sorry for the mistake.
>>
>
> Can you test with latest stable version, ie 5.2.8 ?

I need to build a kernel.

Is there some documentation to cross-compile a kernel ?

Christian


next take at setting up a dma mask by default for platform devices v2

2019-08-16 Thread Christoph Hellwig
Hi all,

this is another attempt to make sure the dma_mask pointer is always
initialized for platform devices.  Not doing so lead to lots of
boilerplate code, and makes platform devices different from all our
major busses like PCI where we always set up a dma_mask.  In the long
run this should also help to eventually make dma_mask a scalar value
instead of a pointer and remove even more cruft.

The bigger blocker for this last time was the fact that the usb
subsystem uses the presence or lack of a dma_mask to check if the core
should do dma mapping for the driver, which is highly unusual.  So we
fix this first.  Note that this has some overlap with the pending
desire to use the proper dma_mmap_coherent helper for mapping usb
buffers.  The first two patches have already been queued up by Greg
and are only included for completeness.

Changes since v1:
 - fix a compile error in the ppc of ohci driver
 - revamp the last patch to get rid of the archdata callout entirely.


Re: [PATCH v4 22/25] powernv/fadump: Warn before processing partial crashdump

2019-08-16 Thread Mahesh J Salgaonkar
On 2019-07-16 17:04:38 Tue, Hari Bathini wrote:
> If not all kernel boot memory regions are registered for MPIPL before
> system crashes, try processing the partial crashdump but warn the user
> before proceeding.
> 
> Signed-off-by: Hari Bathini 
> ---
>  arch/powerpc/platforms/powernv/opal-fadump.c |   21 +
>  1 file changed, 21 insertions(+)
> 
> diff --git a/arch/powerpc/platforms/powernv/opal-fadump.c 
> b/arch/powerpc/platforms/powernv/opal-fadump.c
> index b55f25c..3ef212d 100644
> --- a/arch/powerpc/platforms/powernv/opal-fadump.c
> +++ b/arch/powerpc/platforms/powernv/opal-fadump.c
> @@ -136,6 +136,27 @@ static void opal_fadump_get_config(struct fw_dump 
> *fadump_conf,
>   last_end = base + size;
>   }
>  
> + /*
> +  * Rarely, but it can so happen that system crashes before all
> +  * boot memory regions are registered for MPIPL. In such
> +  * cases, warn that the vmcore may not be accurate and proceed
> +  * anyway as that is the best bet considering free pages, cache
> +  * pages, user pages, etc are usually filtered out.
> +  *
> +  * Hope the memory that could not be preserved only has pages
> +  * that are usually filtered out while saving the vmcore.
> +  */
> + if (fdm->region_cnt < fdm->registered_regions) {
> + pr_warn("The crashdump may not be accurate as the below boot 
> memory regions could not be preserved:\n");

This would be opal crashing while kernel is middle of gearing itself for
fadump. If you decide to still go ahead with partial dump then you will need to
have nice warning message about dump capture (makedmpfile capture) may
fail, but we will still have full opal core that can help in analysis.

Thanks,
-Mahesh.