Re: [PATCH] blkcg: allocate struct blkcg_gq outside request queue spinlock
On Fri, Mar 24, 2017 at 04:04:32PM -0600, Jens Axboe wrote: > On 03/24/2017 03:56 PM, Tahsin Erdogan wrote: > > blkg_conf_prep() currently calls blkg_lookup_create() while holding > > request queue spinlock. This means allocating memory for struct > > blkcg_gq has to be made non-blocking. This causes occasional -ENOMEM > > failures in call paths like below: > > > > pcpu_alloc+0x68f/0x710 > > __alloc_percpu_gfp+0xd/0x10 > > __percpu_counter_init+0x55/0xc0 > > cfq_pd_alloc+0x3b2/0x4e0 > > blkg_alloc+0x187/0x230 > > blkg_create+0x489/0x670 > > blkg_lookup_create+0x9a/0x230 > > blkg_conf_prep+0x1fb/0x240 > > __cfqg_set_weight_device.isra.105+0x5c/0x180 > > cfq_set_weight_on_dfl+0x69/0xc0 > > cgroup_file_write+0x39/0x1c0 > > kernfs_fop_write+0x13f/0x1d0 > > __vfs_write+0x23/0x120 > > vfs_write+0xc2/0x1f0 > > SyS_write+0x44/0xb0 > > entry_SYSCALL_64_fastpath+0x18/0xad > > > > In the code path above, percpu allocator cannot call vmalloc() due to > > queue spinlock. > > > > A failure in this call path gives grief to tools which are trying to > > configure io weights. We see occasional failures happen shortly after > > reboots even when system is not under any memory pressure. Machines > > with a lot of cpus are more vulnerable to this condition. > > > > Do struct blkcg_gq allocations outside the queue spinlock to allow > > blocking during memory allocations. > > This looks much simpler/cleaner to me, compared to v5. Tejun, what do > you think? So, this patch in itself looks better but now we end up with two separate mechanisms to handle non-atomic allocations. This drop lock / alloc / relock / check invariants in the main path and preallocation logic used in the init path. Right now, both proposed implementations aren't that satisfactory. Personally, I'd prefer superficial ugliness to structural duplications, but, ideally, we shouldn't have to make this choice. idk, it's a bug fix. We can always clean things up later. Acked-by: Tejun HeoThanks. -- tejun
Re: [PATCH] blkcg: allocate struct blkcg_gq outside request queue spinlock
On Sun, Mar 26, 2017 at 3:54 AM, Julia Lawallwrote: > Is an unlock needed before line 848? Or does blkg_lookup_check take care > of it? Unlock is not needed. On success, function returns with locks held. It is documented at line 805: "... This function returns with RCU read * lock and queue lock held and must be paired with blkg_conf_finish()."
Re: [PATCH] blkcg: allocate struct blkcg_gq outside request queue spinlock
Is an unlock needed before line 848? Or does blkg_lookup_check take care of it? julia -- Forwarded message -- Date: Sun, 26 Mar 2017 07:50:17 +0800 From: kbuild test robot <fengguang...@intel.com> To: kbu...@01.org Cc: Julia Lawall <julia.law...@lip6.fr> Subject: Re: [PATCH] blkcg: allocate struct blkcg_gq outside request queue spinlock CC: kbuild-...@01.org In-Reply-To: <20170324215627.12831-1-tah...@google.com> TO: Tahsin Erdogan <tah...@google.com> CC: Tejun Heo <t...@kernel.org>, Jens Axboe <ax...@kernel.dk>, linux-block@vger.kernel.org, David Rientjes <rient...@google.com>, linux-ker...@vger.kernel.org, Tahsin Erdogan <tah...@google.com> CC: linux-block@vger.kernel.org, David Rientjes <rient...@google.com>, linux-ker...@vger.kernel.org, Tahsin Erdogan <tah...@google.com> Hi Tahsin, [auto build test WARNING on block/for-next] [also build test WARNING on v4.11-rc3 next-20170324] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Tahsin-Erdogan/blkcg-allocate-struct-blkcg_gq-outside-request-queue-spinlock/20170326-020755 base: https://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux-block.git for-next :: branch date: 6 hours ago :: commit date: 6 hours ago >> block/blk-cgroup.c:901:1-7: preceding lock on line 839 block/blk-cgroup.c:901:1-7: preceding lock on line 876 git remote add linux-review https://github.com/0day-ci/linux git remote update linux-review git checkout d6344a76907e470f483dcb19998d438d19f6432d vim +901 block/blk-cgroup.c d6344a76 Tahsin Erdogan 2017-03-24 833 goto fail; 5f6c2d2b Tejun Heo 2015-07-22 834 } e56da7e2 Tejun Heo 2012-03-05 835 d6344a76 Tahsin Erdogan 2017-03-24 836 q = disk->queue; d6344a76 Tahsin Erdogan 2017-03-24 837 e56da7e2 Tejun Heo 2012-03-05 838 rcu_read_lock(); d6344a76 Tahsin Erdogan 2017-03-24 @839 spin_lock_irq(q->queue_lock); da8b0662 Tejun Heo 2012-04-13 840 d6344a76 Tahsin Erdogan 2017-03-24 841 blkg = blkg_lookup_check(blkcg, pol, q); d6344a76 Tahsin Erdogan 2017-03-24 842 if (IS_ERR(blkg)) { d6344a76 Tahsin Erdogan 2017-03-24 843 ret = PTR_ERR(blkg); d6344a76 Tahsin Erdogan 2017-03-24 844 goto fail_unlock; d6344a76 Tahsin Erdogan 2017-03-24 845 } d6344a76 Tahsin Erdogan 2017-03-24 846 d6344a76 Tahsin Erdogan 2017-03-24 847 if (blkg) d6344a76 Tahsin Erdogan 2017-03-24 848 goto success; d6344a76 Tahsin Erdogan 2017-03-24 849 d6344a76 Tahsin Erdogan 2017-03-24 850 /* d6344a76 Tahsin Erdogan 2017-03-24 851 * Create blkgs walking down from blkcg_root to @blkcg, so that all d6344a76 Tahsin Erdogan 2017-03-24 852 * non-root blkgs have access to their parents. d6344a76 Tahsin Erdogan 2017-03-24 853 */ d6344a76 Tahsin Erdogan 2017-03-24 854 while (true) { d6344a76 Tahsin Erdogan 2017-03-24 855 struct blkcg *pos = blkcg; d6344a76 Tahsin Erdogan 2017-03-24 856 struct blkcg *parent; d6344a76 Tahsin Erdogan 2017-03-24 857 struct blkcg_gq *new_blkg; e56da7e2 Tejun Heo 2012-03-05 858 d6344a76 Tahsin Erdogan 2017-03-24 859 parent = blkcg_parent(blkcg); d6344a76 Tahsin Erdogan 2017-03-24 860 while (parent && !__blkg_lookup(parent, q, false)) { d6344a76 Tahsin Erdogan 2017-03-24 861 pos = parent; d6344a76 Tahsin Erdogan 2017-03-24 862 parent = blkcg_parent(parent); d6344a76 Tahsin Erdogan 2017-03-24 863 } d6344a76 Tahsin Erdogan 2017-03-24 864 d6344a76 Tahsin Erdogan 2017-03-24 865 /* Drop locks to do new blkg allocation with GFP_KERNEL. */ d6344a76 Tahsin Erdogan 2017-03-24 866 spin_unlock_irq(q->queue_lock); d6344a76 Tahsin Erdogan 2017-03-24 867 rcu_read_unlock(); d6344a76 Tahsin Erdogan 2017-03-24 868 d6344a76 Tahsin Erdogan 2017-03-24 869 new_blkg = blkg_alloc(pos, q, GFP_KERNEL); d6344a76 Tahsin Erdogan 2017-03-24 870 if (unlikely(!new_blkg)) { d6344a76 Tahsin Erdogan 2017-03-24 871 ret = -ENOMEM; d6344a76 Tahsin Erdogan 2017-03-24 872 goto fail; d6344a76 Tahsin Erdogan 2017-03-24 873 } d6344a76 Tahsin Erdogan 2017-03-24 874 d6344a76 Tahsin Erdogan 2017-03-24 875 rcu_read_lock(); d6344a76 Tahsin Erdogan 2017-03-24 876 spin_lock_irq(q->queue_lock); d6344a76 Tahsin Erdogan 2017-03-24 877 d6344a76 Tahsin Erdogan 2017-03-24 878 blkg = blkg_lookup_check(pos, pol, q); e56da7e2 Tejun Heo 2012-03-05 879 if (IS_ERR(blkg)) { e56da7e2 Tejun Heo
Re: [PATCH] blkcg: allocate struct blkcg_gq outside request queue spinlock
Hello, On Tue, Feb 28, 2017 at 03:51:27PM -0800, Tahsin Erdogan wrote: > On Tue, Feb 28, 2017 at 2:47 PM, Tejun Heowrote: > >> + if (!blkcg_policy_enabled(q, pol)) { > >> + ret = -EOPNOTSUPP; > >> + goto fail; > > > > Pulling this out of the queue_lock doesn't seem safe to me. This > > function may end up calling into callbacks of disabled policies this > > way. > > I will move this to within the lock. To make things safe, I am also > thinking of rechecking both blkcg_policy_enabled() and > blk_queue_bypass() after reacquiring the locks in each iteration. > > >> + parent = blkcg_parent(blkcg); > >> + while (parent && !__blkg_lookup(parent, q, false)) { > >> + pos = parent; > >> + parent = blkcg_parent(parent); > >> + } > > > > Hmm... how about adding @new_blkg to blkg_lookup_create() and calling > > it with non-NULL @new_blkg until it succeeds? Wouldn't that be > > simpler? > > > >> + > >> + new_blkg = blkg_alloc(pos, q, GFP_KERNEL); > > The challenge with that approach is creating a new_blkg with the right > blkcg before passing to blkg_lookup_create(). blkg_lookup_create() > walks down the hierarchy and will try to fill the first missing entry > and the preallocated new_blkg must have been created with the right > blkcg (feel free to send a code fragment if you think I am > misunderstanding the suggestion). Ah, indeed, but we can break out allocation of blkg and its initialization, right? It's a bit more work but then we'd be able to do something like. retry: new_blkg = alloc; lock; sanity checks; blkg = blkg_lookup_and_create(..., new_blkg); if (!blkg) { unlock; goto retry; } Thanks. -- tejun
Re: [PATCH] blkcg: allocate struct blkcg_gq outside request queue spinlock
On Tue, Feb 28, 2017 at 2:47 PM, Tejun Heowrote: >> + if (!blkcg_policy_enabled(q, pol)) { >> + ret = -EOPNOTSUPP; >> + goto fail; > > Pulling this out of the queue_lock doesn't seem safe to me. This > function may end up calling into callbacks of disabled policies this > way. I will move this to within the lock. To make things safe, I am also thinking of rechecking both blkcg_policy_enabled() and blk_queue_bypass() after reacquiring the locks in each iteration. >> + parent = blkcg_parent(blkcg); >> + while (parent && !__blkg_lookup(parent, q, false)) { >> + pos = parent; >> + parent = blkcg_parent(parent); >> + } > > Hmm... how about adding @new_blkg to blkg_lookup_create() and calling > it with non-NULL @new_blkg until it succeeds? Wouldn't that be > simpler? > >> + >> + new_blkg = blkg_alloc(pos, q, GFP_KERNEL); The challenge with that approach is creating a new_blkg with the right blkcg before passing to blkg_lookup_create(). blkg_lookup_create() walks down the hierarchy and will try to fill the first missing entry and the preallocated new_blkg must have been created with the right blkcg (feel free to send a code fragment if you think I am misunderstanding the suggestion).
Re: [PATCH] blkcg: allocate struct blkcg_gq outside request queue spinlock
Hello, Overall, the approach looks good to me but please see below. On Mon, Feb 27, 2017 at 06:49:57PM -0800, Tahsin Erdogan wrote: > @@ -806,44 +807,99 @@ int blkg_conf_prep(struct blkcg *blkcg, const struct > blkcg_policy *pol, > if (!disk) > return -ENODEV; > if (part) { > - owner = disk->fops->owner; > - put_disk(disk); > - module_put(owner); > - return -ENODEV; > + ret = -ENODEV; > + goto fail; > + } > + > + q = disk->queue; > + > + if (!blkcg_policy_enabled(q, pol)) { > + ret = -EOPNOTSUPP; > + goto fail; Pulling this out of the queue_lock doesn't seem safe to me. This function may end up calling into callbacks of disabled policies this way. > + /* > + * Create blkgs walking down from blkcg_root to @blkcg, so that all > + * non-root blkgs have access to their parents. > + */ > + while (true) { > + struct blkcg *pos = blkcg; > + struct blkcg *parent; > + struct blkcg_gq *new_blkg; > + > + parent = blkcg_parent(blkcg); > + while (parent && !__blkg_lookup(parent, q, false)) { > + pos = parent; > + parent = blkcg_parent(parent); > + } Hmm... how about adding @new_blkg to blkg_lookup_create() and calling it with non-NULL @new_blkg until it succeeds? Wouldn't that be simpler? > + > + new_blkg = blkg_alloc(pos, q, GFP_KERNEL); > + if (unlikely(!new_blkg)) { > + ret = -ENOMEM; > + goto fail; > + } > + > + rcu_read_lock(); > + spin_lock_irq(q->queue_lock); > + > + /* Lookup again since we dropped the lock for blkg_alloc(). */ > + blkg = __blkg_lookup(pos, q, false); > + if (blkg) { > + blkg_free(new_blkg); > + } else { > + blkg = blkg_create(pos, q, new_blkg); > + if (unlikely(IS_ERR(blkg))) { > + ret = PTR_ERR(blkg); > + goto fail_unlock; > + } than duplicating the same logic here? Thanks. -- tejun