Re: [PATCH kernel v3] vfio/spapr: Add cond_resched() for huge updates
On Thu, 28 Sep 2017 19:16:12 +1000 Alexey Kardashevskiywrote: > Clearing very big IOMMU tables can trigger soft lockups. This adds > cond_resched() to allow the scheduler to do context switching when > it decides to. > > Signed-off-by: Alexey Kardashevskiy > --- > > The testcase is POWER9 box with 264GB guest, 4 VFIO devices from > independent IOMMU groups, 64K IOMMU pages. This configuration produces > 4325376 TCE entries, each entry update incurs 4 OPAL calls to update > an individual PE TCE cache; this produced lockups for more than 20s. > Reducing table size to 4194304 (i.e. 256GB guest) or removing one > of 4 VFIO devices makes the problem go away. > > --- > Changes: > v3: > * cond_resched() checks for should_resched() so we just call resched() > and let the cpu scheduler decide whether to switch or not > > v2: > * replaced with time based solution > --- > drivers/vfio/vfio_iommu_spapr_tce.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c > b/drivers/vfio/vfio_iommu_spapr_tce.c > index 63112c36ab2d..759a5bdd40e1 100644 > --- a/drivers/vfio/vfio_iommu_spapr_tce.c > +++ b/drivers/vfio/vfio_iommu_spapr_tce.c > @@ -507,6 +507,8 @@ static int tce_iommu_clear(struct tce_container > *container, > enum dma_data_direction direction; > > for ( ; pages; --pages, ++entry) { > + cond_resched(); > + > direction = DMA_NONE; > oldhpa = 0; > ret = iommu_tce_xchg(tbl, entry, , ); This looks fine to me, I've applied it to my local next branch for v4.15. I'll push that branch next week, once I can rebase to 4.14-rc3. Thanks, Alex
Re: [PATCH kernel v3] vfio/spapr: Add cond_resched() for huge updates
On Thu, Sep 28, 2017 at 07:16:12PM +1000, Alexey Kardashevskiy wrote: > Clearing very big IOMMU tables can trigger soft lockups. This adds > cond_resched() to allow the scheduler to do context switching when > it decides to. > > Signed-off-by: Alexey KardashevskiyReviewed-by: David Gibson > --- > > The testcase is POWER9 box with 264GB guest, 4 VFIO devices from > independent IOMMU groups, 64K IOMMU pages. This configuration produces > 4325376 TCE entries, each entry update incurs 4 OPAL calls to update > an individual PE TCE cache; this produced lockups for more than 20s. > Reducing table size to 4194304 (i.e. 256GB guest) or removing one > of 4 VFIO devices makes the problem go away. > > --- > Changes: > v3: > * cond_resched() checks for should_resched() so we just call resched() > and let the cpu scheduler decide whether to switch or not > > v2: > * replaced with time based solution > --- > drivers/vfio/vfio_iommu_spapr_tce.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c > b/drivers/vfio/vfio_iommu_spapr_tce.c > index 63112c36ab2d..759a5bdd40e1 100644 > --- a/drivers/vfio/vfio_iommu_spapr_tce.c > +++ b/drivers/vfio/vfio_iommu_spapr_tce.c > @@ -507,6 +507,8 @@ static int tce_iommu_clear(struct tce_container > *container, > enum dma_data_direction direction; > > for ( ; pages; --pages, ++entry) { > + cond_resched(); > + > direction = DMA_NONE; > oldhpa = 0; > ret = iommu_tce_xchg(tbl, entry, , ); -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson signature.asc Description: PGP signature
[PATCH kernel v3] vfio/spapr: Add cond_resched() for huge updates
Clearing very big IOMMU tables can trigger soft lockups. This adds cond_resched() to allow the scheduler to do context switching when it decides to. Signed-off-by: Alexey Kardashevskiy--- The testcase is POWER9 box with 264GB guest, 4 VFIO devices from independent IOMMU groups, 64K IOMMU pages. This configuration produces 4325376 TCE entries, each entry update incurs 4 OPAL calls to update an individual PE TCE cache; this produced lockups for more than 20s. Reducing table size to 4194304 (i.e. 256GB guest) or removing one of 4 VFIO devices makes the problem go away. --- Changes: v3: * cond_resched() checks for should_resched() so we just call resched() and let the cpu scheduler decide whether to switch or not v2: * replaced with time based solution --- drivers/vfio/vfio_iommu_spapr_tce.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c index 63112c36ab2d..759a5bdd40e1 100644 --- a/drivers/vfio/vfio_iommu_spapr_tce.c +++ b/drivers/vfio/vfio_iommu_spapr_tce.c @@ -507,6 +507,8 @@ static int tce_iommu_clear(struct tce_container *container, enum dma_data_direction direction; for ( ; pages; --pages, ++entry) { + cond_resched(); + direction = DMA_NONE; oldhpa = 0; ret = iommu_tce_xchg(tbl, entry, , ); -- 2.11.0