Re: [PATCH v2] iommu/iova: silence warnings under memory pressure
On Fri, 2019-11-22 at 11:46 -0500, Qian Cai wrote: > On Fri, 2019-11-22 at 08:28 -0800, Joe Perches wrote: > > On Fri, 2019-11-22 at 09:59 -0500, Qian Cai wrote: > > > On Thu, 2019-11-21 at 20:37 -0800, Joe Perches wrote: > > > > On Thu, 2019-11-21 at 21:55 -0500, Qian Cai wrote: > > > > > When running heavy memory pressure workloads, this 5+ old system is > > > > > throwing endless warnings below because disk IO is too slow to recover > > > > > from swapping. Since the volume from alloc_iova_fast() could be large, > > > > > once it calls printk(), it will trigger disk IO (writing to the log > > > > > files) and pending softirqs which could cause an infinite loop and > > > > > make > > > > > no progress for days by the ongoimng memory reclaim. This is the > > > > > counter > > > > > part for Intel where the AMD part has already been merged. See the > > > > > commit 3d708895325b ("iommu/amd: Silence warnings under memory > > > > > pressure"). Since the allocation failure will be reported in > > > > > intel_alloc_iova(), so just call printk_ratelimted() there and silence > > > > > the one in alloc_iova_mem() to avoid the expensive warn_alloc(). > > > > > > > > [] > > > > > v2: use dev_err_ratelimited() and improve the commit messages. > > > > > > > > [] > > > > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > > > > > > > > [] > > > > > @@ -3401,7 +3401,8 @@ static unsigned long intel_alloc_iova(struct > > > > > device *dev, > > > > > iova_pfn = alloc_iova_fast(&domain->iovad, nrpages, > > > > > IOVA_PFN(dma_mask), true); > > > > > if (unlikely(!iova_pfn)) { > > > > > - dev_err(dev, "Allocating %ld-page iova failed", > > > > > nrpages); > > > > > + dev_err_ratelimited(dev, "Allocating %ld-page iova > > > > > failed", > > > > > + nrpages); > > > > > > > > Trivia: > > > > > > > > This should really have a \n termination on the format string > > > > > > > > dev_err_ratelimited(dev, "Allocating %ld-page iova > > > > failed\n", > > > > > > > > > > > > > > Why do you say so? It is right now printing with a newline added anyway. > > > > > > hpsa :03:00.0: DMAR: Allocating 1-page iova failed > > > > If another process uses pr_cont at the same time, > > it can be interleaved. > > I lean towards fixing that in a separate patch if ever needed, as the origin > dev_err() has no "\n" enclosed either. Your choice. I wrote trivia:, but touching the same line multiple times is relatively pointless. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] iommu/iova: silence warnings under memory pressure
On Fri, 2019-11-22 at 08:28 -0800, Joe Perches wrote: > On Fri, 2019-11-22 at 09:59 -0500, Qian Cai wrote: > > On Thu, 2019-11-21 at 20:37 -0800, Joe Perches wrote: > > > On Thu, 2019-11-21 at 21:55 -0500, Qian Cai wrote: > > > > When running heavy memory pressure workloads, this 5+ old system is > > > > throwing endless warnings below because disk IO is too slow to recover > > > > from swapping. Since the volume from alloc_iova_fast() could be large, > > > > once it calls printk(), it will trigger disk IO (writing to the log > > > > files) and pending softirqs which could cause an infinite loop and make > > > > no progress for days by the ongoimng memory reclaim. This is the counter > > > > part for Intel where the AMD part has already been merged. See the > > > > commit 3d708895325b ("iommu/amd: Silence warnings under memory > > > > pressure"). Since the allocation failure will be reported in > > > > intel_alloc_iova(), so just call printk_ratelimted() there and silence > > > > the one in alloc_iova_mem() to avoid the expensive warn_alloc(). > > > > > > [] > > > > v2: use dev_err_ratelimited() and improve the commit messages. > > > > > > [] > > > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > > > > > > [] > > > > @@ -3401,7 +3401,8 @@ static unsigned long intel_alloc_iova(struct > > > > device *dev, > > > > iova_pfn = alloc_iova_fast(&domain->iovad, nrpages, > > > >IOVA_PFN(dma_mask), true); > > > > if (unlikely(!iova_pfn)) { > > > > - dev_err(dev, "Allocating %ld-page iova failed", > > > > nrpages); > > > > + dev_err_ratelimited(dev, "Allocating %ld-page iova > > > > failed", > > > > + nrpages); > > > > > > Trivia: > > > > > > This should really have a \n termination on the format string > > > > > > dev_err_ratelimited(dev, "Allocating %ld-page iova failed\n", > > > > > > > > > > Why do you say so? It is right now printing with a newline added anyway. > > > > hpsa :03:00.0: DMAR: Allocating 1-page iova failed > > If another process uses pr_cont at the same time, > it can be interleaved. I lean towards fixing that in a separate patch if ever needed, as the origin dev_err() has no "\n" enclosed either. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] iommu/iova: silence warnings under memory pressure
On Fri, 2019-11-22 at 09:59 -0500, Qian Cai wrote: > On Thu, 2019-11-21 at 20:37 -0800, Joe Perches wrote: > > On Thu, 2019-11-21 at 21:55 -0500, Qian Cai wrote: > > > When running heavy memory pressure workloads, this 5+ old system is > > > throwing endless warnings below because disk IO is too slow to recover > > > from swapping. Since the volume from alloc_iova_fast() could be large, > > > once it calls printk(), it will trigger disk IO (writing to the log > > > files) and pending softirqs which could cause an infinite loop and make > > > no progress for days by the ongoimng memory reclaim. This is the counter > > > part for Intel where the AMD part has already been merged. See the > > > commit 3d708895325b ("iommu/amd: Silence warnings under memory > > > pressure"). Since the allocation failure will be reported in > > > intel_alloc_iova(), so just call printk_ratelimted() there and silence > > > the one in alloc_iova_mem() to avoid the expensive warn_alloc(). > > > > [] > > > v2: use dev_err_ratelimited() and improve the commit messages. > > > > [] > > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > > > > [] > > > @@ -3401,7 +3401,8 @@ static unsigned long intel_alloc_iova(struct device > > > *dev, > > > iova_pfn = alloc_iova_fast(&domain->iovad, nrpages, > > > IOVA_PFN(dma_mask), true); > > > if (unlikely(!iova_pfn)) { > > > - dev_err(dev, "Allocating %ld-page iova failed", nrpages); > > > + dev_err_ratelimited(dev, "Allocating %ld-page iova failed", > > > + nrpages); > > > > Trivia: > > > > This should really have a \n termination on the format string > > > > dev_err_ratelimited(dev, "Allocating %ld-page iova failed\n", > > > > > > Why do you say so? It is right now printing with a newline added anyway. > > hpsa :03:00.0: DMAR: Allocating 1-page iova failed If another process uses pr_cont at the same time, it can be interleaved. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] iommu/iova: silence warnings under memory pressure
On Thu, 2019-11-21 at 20:37 -0800, Joe Perches wrote: > On Thu, 2019-11-21 at 21:55 -0500, Qian Cai wrote: > > When running heavy memory pressure workloads, this 5+ old system is > > throwing endless warnings below because disk IO is too slow to recover > > from swapping. Since the volume from alloc_iova_fast() could be large, > > once it calls printk(), it will trigger disk IO (writing to the log > > files) and pending softirqs which could cause an infinite loop and make > > no progress for days by the ongoimng memory reclaim. This is the counter > > part for Intel where the AMD part has already been merged. See the > > commit 3d708895325b ("iommu/amd: Silence warnings under memory > > pressure"). Since the allocation failure will be reported in > > intel_alloc_iova(), so just call printk_ratelimted() there and silence > > the one in alloc_iova_mem() to avoid the expensive warn_alloc(). > > [] > > v2: use dev_err_ratelimited() and improve the commit messages. > > [] > > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c > > [] > > @@ -3401,7 +3401,8 @@ static unsigned long intel_alloc_iova(struct device > > *dev, > > iova_pfn = alloc_iova_fast(&domain->iovad, nrpages, > >IOVA_PFN(dma_mask), true); > > if (unlikely(!iova_pfn)) { > > - dev_err(dev, "Allocating %ld-page iova failed", nrpages); > > + dev_err_ratelimited(dev, "Allocating %ld-page iova failed", > > + nrpages); > > Trivia: > > This should really have a \n termination on the format string > > dev_err_ratelimited(dev, "Allocating %ld-page iova failed\n", > > Why do you say so? It is right now printing with a newline added anyway. hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] iommu/iova: silence warnings under memory pressure
On Thu, 2019-11-21 at 21:55 -0500, Qian Cai wrote: > When running heavy memory pressure workloads, this 5+ old system is > throwing endless warnings below because disk IO is too slow to recover > from swapping. Since the volume from alloc_iova_fast() could be large, > once it calls printk(), it will trigger disk IO (writing to the log > files) and pending softirqs which could cause an infinite loop and make > no progress for days by the ongoimng memory reclaim. This is the counter > part for Intel where the AMD part has already been merged. See the > commit 3d708895325b ("iommu/amd: Silence warnings under memory > pressure"). Since the allocation failure will be reported in > intel_alloc_iova(), so just call printk_ratelimted() there and silence > the one in alloc_iova_mem() to avoid the expensive warn_alloc(). [] > v2: use dev_err_ratelimited() and improve the commit messages. [] > diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c [] > @@ -3401,7 +3401,8 @@ static unsigned long intel_alloc_iova(struct device > *dev, > iova_pfn = alloc_iova_fast(&domain->iovad, nrpages, > IOVA_PFN(dma_mask), true); > if (unlikely(!iova_pfn)) { > - dev_err(dev, "Allocating %ld-page iova failed", nrpages); > + dev_err_ratelimited(dev, "Allocating %ld-page iova failed", > + nrpages); Trivia: This should really have a \n termination on the format string dev_err_ratelimited(dev, "Allocating %ld-page iova failed\n", ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2] iommu/iova: silence warnings under memory pressure
When running heavy memory pressure workloads, this 5+ old system is throwing endless warnings below because disk IO is too slow to recover from swapping. Since the volume from alloc_iova_fast() could be large, once it calls printk(), it will trigger disk IO (writing to the log files) and pending softirqs which could cause an infinite loop and make no progress for days by the ongoimng memory reclaim. This is the counter part for Intel where the AMD part has already been merged. See the commit 3d708895325b ("iommu/amd: Silence warnings under memory pressure"). Since the allocation failure will be reported in intel_alloc_iova(), so just call printk_ratelimted() there and silence the one in alloc_iova_mem() to avoid the expensive warn_alloc(). hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed hpsa :03:00.0: DMAR: Allocating 1-page iova failed slab_out_of_memory: 66 callbacks suppressed SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: iommu_iova, object size: 40, buffer size: 448, default order: 0, min order: 0 node 0: slabs: 1822, objs: 16398, free: 0 node 1: slabs: 2051, objs: 18459, free: 31 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: iommu_iova, object size: 40, buffer size: 448, default order: 0, min order: 0 node 0: slabs: 1822, objs: 16398, free: 0 node 1: slabs: 2051, objs: 18459, free: 31 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: iommu_iova, object size: 40, buffer size: 448, default order: 0, min order: 0 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 node 0: slabs: 697, objs: 4182, free: 0 node 0: slabs: 697, objs: 4182, free: 0 node 0: slabs: 697, objs: 4182, free: 0 node 0: slabs: 697, objs: 4182, free: 0 node 1: slabs: 381, objs: 2286, free: 27 node 1: slabs: 381, objs: 2286, free: 27 node 1: slabs: 381, objs: 2286, free: 27 node 1: slabs: 381, objs: 2286, free: 27 node 0: slabs: 1822, objs: 16398, free: 0 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 node 1: slabs: 2051, objs: 18459, free: 31 node 0: slabs: 697, objs: 4182, free: 0 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) node 1: slabs: 381, objs: 2286, free: 27 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 node 0: slabs: 697, objs: 4182, free: 0 node 1: slabs: 381, objs: 2286, free: 27 hpsa :03:00.0: DMAR: Allocating 1-page iova failed warn_alloc: 96 callbacks suppressed kworker/11:1H: page allocation failure: order:0, mode:0xa20(GFP_ATOMIC), nodemask=(null),cpuset=/,mems_allowed=0-1 CPU: 11 PID: 1642 Comm: kworker/11:1H Tainted: GB Hardware name: HP ProLiant XL420 Gen9/ProLiant XL420 Gen9, BIOS U19 12/27/2015 Workqueue: kblockd blk_mq_run_work_fn Call Trace: dump_stack+0xa0/0xea warn_alloc.cold.94+0x8a/0x12d __alloc_pages_slowpath+0x1750/0x1870 __alloc_pages_nodemask+0x58a/0x710 alloc_pages_current+0x9c/0x110 alloc_slab_page+0xc9/0x760 allocate_slab+0x48f/0x5d0 new_slab+0x46/0x70 ___slab_alloc+0x4ab/0x7b0 __slab_alloc+0x43/0x70 kmem_cache_alloc+0x2dd/0x450 SLUB: Unable to allocate memory on node -1, gfp=0xa20(GFP_ATOMIC) alloc_iova+0x33/0x210 cache: skbuff_head_cache, object size: 208, buffer size: 640, default order: 0, min order: 0 node 0: slabs: 697, objs: 4182, free: 0 alloc_iova_fast+0x62/0x3d1 node 1: slabs: 381, objs: 2286, free: 27 intel_alloc_iova+0xce/0xe0 intel_map_sg+0xed/0x410 scsi_dma_map+0xd7/0x160 scsi_queue_rq+0xbf7/0x1310 blk_mq_dispatch_rq_list+0x4d9/0xbc0 blk_mq_sched_dispatch_requests+0x24a/0x300 __blk_mq_run_hw_queue+0x156/0x230 blk_mq_run_work_fn+0x3b/0x40 process_one_work+0x579/0xb90 worker_thread+0x63/0x5b0 kthread+0x1e6/0x210 ret_from_fork+0x3a/0x50 Mem-Info: active_anon:2422723 inactive_anon:361971 isolated_anon:34403 active_file:2285 inactive_file:1838 isolated_file:0