Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Thu, 9 Feb 2017, Michal Hocko wrote: > Christoph, you are completely ignoring the reality and the code. There > is no need for stop_machine nor it is helping anything. As the matter > of fact there is a synchronization with the cpu hotplug needed if you > want to make a per-cpu specific operations. get_online_cpus is the > most straightforward and heavy weight way to do this synchronization > but not the only one. As the patch [1] describes we do not really need > get_online_cpus in drain_all_pages because we can do _better_. But > this is not in any way a generic thing applicable to other code paths. > > If you disagree then you are free to post patches but hand waving you > are doing here is just wasting everybody's time. So please cut it here > unless you have specific proposals to improve the current situation. I am fine with the particular solution here for this particular problem. My problem is the general way of having to synchronize via get_online_cpus() because of cpu hotplug operations.
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Thu 09-02-17 11:22:49, Cristopher Lameter wrote: > On Thu, 9 Feb 2017, Thomas Gleixner wrote: > > > You are just not getting it, really. > > > > The problem is that this for_each_online_cpu() is racy against a concurrent > > hot unplug and therefor can queue stuff for a not longer online cpu. That's > > what the mm folks tried to avoid by preventing a CPU hotplug operation > > before entering that loop. > > With a stop machine action it is NOT racy because the machine goes into a > special kernel state that guarantees that key operating system structures > are not touched. See mm/page_alloc.c's use of that characteristic to build > zonelists. Thus it cannot be executing for_each_online_cpu and related > tasks (unless one does not disable preempt but that is a given if a > spinlock has been taken).. Christoph, you are completely ignoring the reality and the code. There is no need for stop_machine nor it is helping anything. As the matter of fact there is a synchronization with the cpu hotplug needed if you want to make a per-cpu specific operations. get_online_cpus is the most straightforward and heavy weight way to do this synchronization but not the only one. As the patch [1] describes we do not really need get_online_cpus in drain_all_pages because we can do _better_. But this is not in any way a generic thing applicable to other code paths. If you disagree then you are free to post patches but hand waving you are doing here is just wasting everybody's time. So please cut it here unless you have specific proposals to improve the current situation. Thanks! [1] http://lkml.kernel.org/r/20170207201950.20482-1-mho...@kernel.org -- Michal Hocko SUSE Labs
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Thu, 9 Feb 2017, Christoph Lameter wrote: > On Thu, 9 Feb 2017, Thomas Gleixner wrote: > > > You are just not getting it, really. > > > > The problem is that this for_each_online_cpu() is racy against a concurrent > > hot unplug and therefor can queue stuff for a not longer online cpu. That's > > what the mm folks tried to avoid by preventing a CPU hotplug operation > > before entering that loop. > > With a stop machine action it is NOT racy because the machine goes into a > special kernel state that guarantees that key operating system structures > are not touched. See mm/page_alloc.c's use of that characteristic to build > zonelists. Thus it cannot be executing for_each_online_cpu and related > tasks (unless one does not disable preempt but that is a given if a > spinlock has been taken).. drain_all_pages() is called from preemptible context. So what are you talking about again? > > > Lets get rid of get_online_cpus() etc. > > > > And that solves what? > > It gets rid of future issues with serialization in paths were we need to > lock and still do for_each_online_cpu(). There are code pathes which might sleep inside the loop so get_online_cpus() is the only way to serialize against hotplug. Just because the only tool you know is stop machine it does not make everything an atomic context where it can be applied. > > Can you please start to understand the scope of the whole hotplug machinery > > including the requirements for get_online_cpus() before you waste > > everybodys time with your uninformed and halfbaken proposals? > > Its an obvious solution to the issues that have arisen multiple times with > get_online_cpus() within the slab allocators. The hotplug machinery should > make things as easy as possible for other people and having these > get_online_cpus() everywhere does complicate things. It's no obvious solution to everything. It's context dependend and people have to think hard how to solve their problem within the context they are dealing with. Your 'get rid of get_online_cpus()' mantra does make all of this magically simple. Relying on the fact, that the CPU online bit is cleared via stomp_machine(), which is a horrible operation in various aspects, only applies to a very small subset of problems. You can repeat your mantra another thousand times and that will not make the way larger set of problems magically disappear. Thanks, tglx
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Thu, 9 Feb 2017, Thomas Gleixner wrote: > You are just not getting it, really. > > The problem is that this for_each_online_cpu() is racy against a concurrent > hot unplug and therefor can queue stuff for a not longer online cpu. That's > what the mm folks tried to avoid by preventing a CPU hotplug operation > before entering that loop. With a stop machine action it is NOT racy because the machine goes into a special kernel state that guarantees that key operating system structures are not touched. See mm/page_alloc.c's use of that characteristic to build zonelists. Thus it cannot be executing for_each_online_cpu and related tasks (unless one does not disable preempt but that is a given if a spinlock has been taken).. > > Lets get rid of get_online_cpus() etc. > > And that solves what? It gets rid of future issues with serialization in paths were we need to lock and still do for_each_online_cpu(). > Can you please start to understand the scope of the whole hotplug machinery > including the requirements for get_online_cpus() before you waste > everybodys time with your uninformed and halfbaken proposals? Its an obvious solution to the issues that have arisen multiple times with get_online_cpus() within the slab allocators. The hotplug machinery should make things as easy as possible for other people and having these get_online_cpus() everywhere does complicate things.
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Thu, 9 Feb 2017, Christoph Lameter wrote: > On Thu, 9 Feb 2017, Thomas Gleixner wrote: > > > > The stop_machine would need to ensure that all cpus cease processing > > > before proceeding. > > > > Ok. I try again: > > > > CPU 0 CPU 1 > > for_each_online_cpu(cpu) > > ==> cpu = 1 > > stop_machine() > > > > Stops processing on all CPUs by preempting the current execution and > > forcing them into a high priority busy loop with interrupts disabled. > > Exactly that means we are outside of the sections marked with > get_online_cpous(). > > > It does exactly what you describe. It stops processing on all other cpus > > until release, but that does not invalidate any data on those cpus. > > Why would it need to invalidate any data? The change of the cpu masks > would need to be done when the machine is stopped. This sounds exactly > like what we need and much of it is already there. You are just not getting it, really. The problem is that this for_each_online_cpu() is racy against a concurrent hot unplug and therefor can queue stuff for a not longer online cpu. That's what the mm folks tried to avoid by preventing a CPU hotplug operation before entering that loop. > Lets get rid of get_online_cpus() etc. And that solves what? Can you please start to understand the scope of the whole hotplug machinery including the requirements for get_online_cpus() before you waste everybodys time with your uninformed and halfbaken proposals? Thanks, tglx
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Thu, 9 Feb 2017, Thomas Gleixner wrote: > > The stop_machine would need to ensure that all cpus cease processing > > before proceeding. > > Ok. I try again: > > CPU 0 CPU 1 > for_each_online_cpu(cpu) > ==> cpu = 1 > stop_machine() > > Stops processing on all CPUs by preempting the current execution and > forcing them into a high priority busy loop with interrupts disabled. Exactly that means we are outside of the sections marked with get_online_cpous(). > It does exactly what you describe. It stops processing on all other cpus > until release, but that does not invalidate any data on those cpus. Why would it need to invalidate any data? The change of the cpu masks would need to be done when the machine is stopped. This sounds exactly like what we need and much of it is already there. Lets get rid of get_online_cpus() etc.
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Thu, 9 Feb 2017, Christoph Lameter wrote: > On Thu, 9 Feb 2017, Thomas Gleixner wrote: > > > And how does that solve the problem at hand? Not at all: > > > > CPU 0 CPU 1 > > > > for_each_online_cpu(cpu) > > ==> cpu = 1 > > stop_machine() > > set_cpu_online(1, false) > > queue_work(cpu1) > > > > Thanks, > > Well thats not how I remember stop_machine does work. Doesnt it stop the > processing on all cpus otherwise its not a real usable stop. > > The stop_machine would need to ensure that all cpus cease processing > before proceeding. Ok. I try again: CPU 0 CPU 1 for_each_online_cpu(cpu) ==> cpu = 1 stop_machine() Stops processing on all CPUs by preempting the current execution and forcing them into a high priority busy loop with interrupts disabled. context_switch() stomper_thread() busyloop() set_cpu_online(1, false) stop_machine end() release busy looping CPUs context_switch Resumes operation at the preemption point. cpu is still 1 queue_work(cpu == 1) It does exactly what you describe. It stops processing on all other cpus until release, but that does not invalidate any data on those cpus. It's been that way forever. Thanks, tglx
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Thu, 9 Feb 2017, Thomas Gleixner wrote: > And how does that solve the problem at hand? Not at all: > > CPU 0 CPU 1 > > for_each_online_cpu(cpu) > ==> cpu = 1 > stop_machine() > set_cpu_online(1, false) > queue_work(cpu1) > > Thanks, Well thats not how I remember stop_machine does work. Doesnt it stop the processing on all cpus otherwise its not a real usable stop. The stop_machine would need to ensure that all cpus cease processing before proceeding.
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Wed, 8 Feb 2017, Christoph Lameter wrote: > On Wed, 8 Feb 2017, Thomas Gleixner wrote: > > > There is a world outside yours. Hotplug is actually used frequently for > > power purposes in some scenarios. > > The usual case does not inolve hotplug. We do not care about your definition of "usual". The kernel serves _ALL_ use cases. > > It will improve nothing. The stop machine context is extremly limited and > > you cannot do complex things there at all. Not to talk about the inability > > of taking a simple mutex which would immediately deadlock the machine. > > You do not need to do complex things. Basically flipping some cpu mask > bits will do it. stop machine ensures that code is not > executing on the processors when the bits are flipped. That will ensure > that there is no need to do any get_online_cpu() nastiness in critical VM > paths since we are guaranteed not to be executing them. And how does that solve the problem at hand? Not at all: CPU 0 CPU 1 for_each_online_cpu(cpu) ==> cpu = 1 stop_machine() set_cpu_online(1, false) queue_work(cpu1) Thanks, tglx
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Wed, 8 Feb 2017, Thomas Gleixner wrote: > There is a world outside yours. Hotplug is actually used frequently for > power purposes in some scenarios. The usual case does not inolve hotplug. > It will improve nothing. The stop machine context is extremly limited and > you cannot do complex things there at all. Not to talk about the inability > of taking a simple mutex which would immediately deadlock the machine. You do not need to do complex things. Basically flipping some cpu mask bits will do it. stop machine ensures that code is not executing on the processors when the bits are flipped. That will ensure that there is no need to do any get_online_cpu() nastiness in critical VM paths since we are guaranteed not to be executing them. > And everything complex needs to be done _before_ that in normal > context. Hot unplug already uses stop machine for the final removal of the > outgoing CPU, but that's definitely not the place where you can do anything > complex like page management. If it already does that then why do we still need get_online_cpu()? We do not do anything like page management. Why would we? We just need to ensure that nothing is executing when the bits are flipped. If that is the case then the get_online_cpu(0 calls are unecessary because the bit flipping simply cannot occur in these functions. There is nothing to serialize against. > If you can prepare the outgoing cpu work during the cpu offline phase and > then just flip a bit in the stop machine part, then this might work, but > anything else is just handwaving and proliferation of wet dreams. Fine with that.
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Wed, 8 Feb 2017, Christoph Lameter wrote: > On Wed, 8 Feb 2017, Michal Hocko wrote: > > > I have no idea what you are trying to say and how this is related to the > > deadlock we are discussing here. We certainly do not need to add > > stop_machine the problem. And yeah, dropping get_online_cpus was > > possible after considering all fallouts. > > This is not the first time get_online_cpus() causes problems due to the > need to support hotplug for processors. Hotplugging is not happening > frequently (which is low balling it. Actually the frequency of the hotplug > events on almost all systems is zero) so the constant check is a useless > overhead and causes trouble for development. In particular There is a world outside yours. Hotplug is actually used frequently for power purposes in some scenarios. > get_online_cpus() is often needed in sections that need to hold locks. > > So lets get rid of it. The severity, frequency and rarity of processor > hotplug events would justify only allowing adding and removal of > processors through the stop_machine_xx mechanism. With that in place the > processor masks can be used without synchronization and the locking issues > all over the kernel would become simpler. > > It is likely that this will even improve the hotplug code because the > easier form of synchronization (you have a piece of code that executed > while the OS is in stop state) would allow to make more significant > changes to the software environment. F.e. one could think about removing > memory segments as well as maybe per cpu segments. It will improve nothing. The stop machine context is extremly limited and you cannot do complex things there at all. Not to talk about the inability of taking a simple mutex which would immediately deadlock the machine. stop machine is the last resort for things which need to be done atomically and that operation can be done in a very restricted context. And everything complex needs to be done _before_ that in normal context. Hot unplug already uses stop machine for the final removal of the outgoing CPU, but that's definitely not the place where you can do anything complex like page management. If you can prepare the outgoing cpu work during the cpu offline phase and then just flip a bit in the stop machine part, then this might work, but anything else is just handwaving and proliferation of wet dreams. Thanks, tglx
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Wed 08-02-17 09:11:06, Cristopher Lameter wrote: > On Wed, 8 Feb 2017, Michal Hocko wrote: > > > > Huch? stop_machine() is horrible and heavy weight. Don't go there, there > > > must be simpler solutions than that. > > > > Absolutely agreed. We are in the page allocator path so using the > > stop_machine* is just ridiculous. And, in fact, there is a much simpler > > solution [1] > > That is nonsense. stop_machine would be used when adding removing a > processor. There would be no need to synchronize when looping over active > cpus anymore. get_online_cpus() etc would be removed from the hot > path since the cpu masks are guaranteed to be stable. I have no idea what you are trying to say and how this is related to the deadlock we are discussing here. We certainly do not need to add stop_machine the problem. And yeah, dropping get_online_cpus was possible after considering all fallouts. -- Michal Hocko SUSE Labs
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Wed, Feb 08, 2017 at 02:03:32PM +, Mel Gorman wrote: > > Yeah, we'll sort that out once it hits Linus tree and we move RT forward. > > Though I have once complaint right away: > > > > + preempt_enable_no_resched(); > > > > This is a nono, even in mainline. You effectively disable a preemption > > point. > > > > This came up during review on whether it should or shouldn't be a preemption > point. Initially it was preempt_enable() but a preemption point didn't > exist before, the reviewer pushed for it and as it was the allocator fast > path that was unlikely to need a reschedule or preempt, I made the change. Not relevant. The only acceptable use of preempt_enable_no_resched() is if the next statement is a schedule() variant.
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Wed, 8 Feb 2017, Michal Hocko wrote: > I have no idea what you are trying to say and how this is related to the > deadlock we are discussing here. We certainly do not need to add > stop_machine the problem. And yeah, dropping get_online_cpus was > possible after considering all fallouts. This is not the first time get_online_cpus() causes problems due to the need to support hotplug for processors. Hotplugging is not happening frequently (which is low balling it. Actually the frequency of the hotplug events on almost all systems is zero) so the constant check is a useless overhead and causes trouble for development. In particular get_online_cpus() is often needed in sections that need to hold locks. So lets get rid of it. The severity, frequency and rarity of processor hotplug events would justify only allowing adding and removal of processors through the stop_machine_xx mechanism. With that in place the processor masks can be used without synchronization and the locking issues all over the kernel would become simpler. It is likely that this will even improve the hotplug code because the easier form of synchronization (you have a piece of code that executed while the OS is in stop state) would allow to make more significant changes to the software environment. F.e. one could think about removing memory segments as well as maybe per cpu segments.
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Wed, 8 Feb 2017, Michal Hocko wrote: > > Huch? stop_machine() is horrible and heavy weight. Don't go there, there > > must be simpler solutions than that. > > Absolutely agreed. We are in the page allocator path so using the > stop_machine* is just ridiculous. And, in fact, there is a much simpler > solution [1] That is nonsense. stop_machine would be used when adding removing a processor. There would be no need to synchronize when looping over active cpus anymore. get_online_cpus() etc would be removed from the hot path since the cpu masks are guaranteed to be stable.
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Tue, 7 Feb 2017, Thomas Gleixner wrote: > > Yep. Hotplug events are pretty significant. Using stop_machine_() etc > > would be advisable and that would avoid the taking of locks and get rid of > > all the > > ocmplexity, reduce the code size and make the overall system much more > > reliable. > > Huch? stop_machine() is horrible and heavy weight. Don't go there, there > must be simpler solutions than that. Inserting or removing hardware is a heavy process. This would help quite a bit with these issues for loops over active cpus.
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Wed, 8 Feb 2017, Mel Gorman wrote: > It may be worth noting that patches in Andrew's tree no longer disable > interrupts in the per-cpu allocator and now per-cpu draining will > be from workqueue context. The reasoning was due to the overhead of > the page allocator with figures included. Interrupts will bypass the > per-cpu allocator and use the irq-safe zone->lock to allocate from > the core. It'll collide with the RT patch. Primary patch of interest is > http://www.ozlabs.org/~akpm/mmots/broken-out/mm-page_alloc-only-use-per-cpu-allocator-for-irq-safe-requests.patch Yeah, we'll sort that out once it hits Linus tree and we move RT forward. Though I have once complaint right away: + preempt_enable_no_resched(); This is a nono, even in mainline. You effectively disable a preemption point. > The draining from workqueue context may be a problem for RT but one > option would be to move the drain to only drain for high-order pages > after direct reclaim combined with only draining for order-0 if > __alloc_pages_may_oom is about to be called. Why would the draining from workqueue context be an issue on RT? Thanks, tglx
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Wed, Feb 08, 2017 at 02:23:19PM +0100, Thomas Gleixner wrote: > On Wed, 8 Feb 2017, Mel Gorman wrote: > > It may be worth noting that patches in Andrew's tree no longer disable > > interrupts in the per-cpu allocator and now per-cpu draining will > > be from workqueue context. The reasoning was due to the overhead of > > the page allocator with figures included. Interrupts will bypass the > > per-cpu allocator and use the irq-safe zone->lock to allocate from > > the core. It'll collide with the RT patch. Primary patch of interest is > > http://www.ozlabs.org/~akpm/mmots/broken-out/mm-page_alloc-only-use-per-cpu-allocator-for-irq-safe-requests.patch > > Yeah, we'll sort that out once it hits Linus tree and we move RT forward. > Though I have once complaint right away: > > + preempt_enable_no_resched(); > > This is a nono, even in mainline. You effectively disable a preemption > point. > This came up during review on whether it should or shouldn't be a preemption point. Initially it was preempt_enable() but a preemption point didn't exist before, the reviewer pushed for it and as it was the allocator fast path that was unlikely to need a reschedule or preempt, I made the change. I can alter it before it hits mainline if you say RT is going to have an issue with it. > > The draining from workqueue context may be a problem for RT but one > > option would be to move the drain to only drain for high-order pages > > after direct reclaim combined with only draining for order-0 if > > __alloc_pages_may_oom is about to be called. > > Why would the draining from workqueue context be an issue on RT? > It probably isn't. The latency of the operation is likely longer than an IPI was but given the context it occurs in, I severely doubted it mattered. I couldn't think of a reason why it would matter to RT but there was no harm double checking. -- Mel Gorman SUSE Labs
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Wed, Feb 08, 2017 at 01:02:07PM +0100, Thomas Gleixner wrote: > On Wed, 8 Feb 2017, Michal Hocko wrote: > > On Tue 07-02-17 23:25:17, Thomas Gleixner wrote: > > > On Tue, 7 Feb 2017, Christoph Lameter wrote: > > > > On Tue, 7 Feb 2017, Michal Hocko wrote: > > > > > > > > > I am always nervous when seeing hotplug locks being used in low level > > > > > code. It has bitten us several times already and those deadlocks are > > > > > quite hard to spot when reviewing the code and very rare to hit so > > > > > they > > > > > tend to live for a long time. > > > > > > > > Yep. Hotplug events are pretty significant. Using stop_machine_() > > > > etc > > > > would be advisable and that would avoid the taking of locks and get rid > > > > of all the > > > > ocmplexity, reduce the code size and make the overall system much more > > > > reliable. > > > > > > Huch? stop_machine() is horrible and heavy weight. Don't go there, there > > > must be simpler solutions than that. > > > > Absolutely agreed. We are in the page allocator path so using the > > stop_machine* is just ridiculous. And, in fact, there is a much simpler > > solution [1] > > > > [1] http://lkml.kernel.org/r/20170207201950.20482-1-mho...@kernel.org > > Well, yes. It's simple, but from an RT point of view I really don't like > it as we have to fix it up again. > > On RT we solved the problem of the page allocator differently which allows > us to do drain_all_pages() from the caller CPU as a side effect. That's > interesting not only for RT, it's also interesting for NOHZ FULL scenarios > because you don't inflict the work on the other CPUs. > > https://git.kernel.org/cgit/linux/kernel/git/rt/linux-rt-devel.git/commit/?h=linux-4.9.y-rt-rebase&id=d577a017da694e29a06af057c517f2a7051eb305 > It may be worth noting that patches in Andrew's tree no longer disable interrupts in the per-cpu allocator and now per-cpu draining will be from workqueue context. The reasoning was due to the overhead of the page allocator with figures included. Interrupts will bypass the per-cpu allocator and use the irq-safe zone->lock to allocate from the core. It'll collide with the RT patch. Primary patch of interest is http://www.ozlabs.org/~akpm/mmots/broken-out/mm-page_alloc-only-use-per-cpu-allocator-for-irq-safe-requests.patch The draining from workqueue context may be a problem for RT but one option would be to move the drain to only drain for high-order pages after direct reclaim combined with only draining for order-0 if __alloc_pages_may_oom is about to be called. -- Mel Gorman SUSE Labs
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Wed 08-02-17 13:02:07, Thomas Gleixner wrote: > On Wed, 8 Feb 2017, Michal Hocko wrote: [...] > > [1] http://lkml.kernel.org/r/20170207201950.20482-1-mho...@kernel.org > > Well, yes. It's simple, but from an RT point of view I really don't like > it as we have to fix it up again. I thought that preempt_disable would turn into migrate_disable or something like that which shouldn't cause too much trouble. Or am I missing something? Which part of the patch is so RT unfriendly? -- Michal Hocko SUSE Labs
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Wed, 8 Feb 2017, Michal Hocko wrote: > On Tue 07-02-17 23:25:17, Thomas Gleixner wrote: > > On Tue, 7 Feb 2017, Christoph Lameter wrote: > > > On Tue, 7 Feb 2017, Michal Hocko wrote: > > > > > > > I am always nervous when seeing hotplug locks being used in low level > > > > code. It has bitten us several times already and those deadlocks are > > > > quite hard to spot when reviewing the code and very rare to hit so they > > > > tend to live for a long time. > > > > > > Yep. Hotplug events are pretty significant. Using stop_machine_() etc > > > would be advisable and that would avoid the taking of locks and get rid > > > of all the > > > ocmplexity, reduce the code size and make the overall system much more > > > reliable. > > > > Huch? stop_machine() is horrible and heavy weight. Don't go there, there > > must be simpler solutions than that. > > Absolutely agreed. We are in the page allocator path so using the > stop_machine* is just ridiculous. And, in fact, there is a much simpler > solution [1] > > [1] http://lkml.kernel.org/r/20170207201950.20482-1-mho...@kernel.org Well, yes. It's simple, but from an RT point of view I really don't like it as we have to fix it up again. On RT we solved the problem of the page allocator differently which allows us to do drain_all_pages() from the caller CPU as a side effect. That's interesting not only for RT, it's also interesting for NOHZ FULL scenarios because you don't inflict the work on the other CPUs. https://git.kernel.org/cgit/linux/kernel/git/rt/linux-rt-devel.git/commit/?h=linux-4.9.y-rt-rebase&id=d577a017da694e29a06af057c517f2a7051eb305 That uses local locks (an RT speciality which compile away into preempt/irq disable/enable when RT is disabled). Works like a charm :) Thanks, tglx
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Tue 07-02-17 23:25:17, Thomas Gleixner wrote: > On Tue, 7 Feb 2017, Christoph Lameter wrote: > > On Tue, 7 Feb 2017, Michal Hocko wrote: > > > > > I am always nervous when seeing hotplug locks being used in low level > > > code. It has bitten us several times already and those deadlocks are > > > quite hard to spot when reviewing the code and very rare to hit so they > > > tend to live for a long time. > > > > Yep. Hotplug events are pretty significant. Using stop_machine_() etc > > would be advisable and that would avoid the taking of locks and get rid of > > all the > > ocmplexity, reduce the code size and make the overall system much more > > reliable. > > Huch? stop_machine() is horrible and heavy weight. Don't go there, there > must be simpler solutions than that. Absolutely agreed. We are in the page allocator path so using the stop_machine* is just ridiculous. And, in fact, there is a much simpler solution [1] [1] http://lkml.kernel.org/r/20170207201950.20482-1-mho...@kernel.org -- Michal Hocko SUSE Labs
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Mon, 6 Feb 2017, Dmitry Vyukov wrote: > On Mon, Jan 30, 2017 at 4:48 PM, Dmitry Vyukov wrote: > Unfortunately it does not seem to help. > Fuzzer now runs on 510948533b059f4f5033464f9f4a0c32d4ab0c08 of > mmotm/auto-latest > (git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git): > > commit 510948533b059f4f5033464f9f4a0c32d4ab0c08 > Date: Thu Feb 2 10:08:47 2017 +0100 > mmotm: userfaultfd-non-cooperative-add-event-for-memory-unmaps-fix > > The commit you referenced is already there: > > commit 806b158031ca0b4714e775898396529a758ebc2c > Date: Thu Feb 2 08:53:16 2017 +0100 > mm, page_alloc: use static global work_struct for draining per-cpu pages > Chain exists of: > Possible unsafe locking scenario: > >CPU0CPU1 > > lock(pcpu_alloc_mutex); >lock(cpu_hotplug.lock); >lock(pcpu_alloc_mutex); > lock(cpu_hotplug.dep_map); And that's exactly what happens: cpu_up() alloc_percpu() lock(hotplug.lock) lock(&pcpu_alloc_mutex) .. alloc_percpu() drain_all_pages() lock(&pcpu_alloc_mutex) get_online_cpus() lock(hotplug.lock) Classic deadlock, i.e. you _cannot_ call get_online_cpus() while holding pcpu_alloc_mutex. Alternatively you can forbid to do per cpu alloc/free while holding hotplug.lock. I doubt that this will make people happy :) Thanks, tglx
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Tue, 7 Feb 2017, Christoph Lameter wrote: > On Tue, 7 Feb 2017, Michal Hocko wrote: > > > I am always nervous when seeing hotplug locks being used in low level > > code. It has bitten us several times already and those deadlocks are > > quite hard to spot when reviewing the code and very rare to hit so they > > tend to live for a long time. > > Yep. Hotplug events are pretty significant. Using stop_machine_() etc > would be advisable and that would avoid the taking of locks and get rid of > all the > ocmplexity, reduce the code size and make the overall system much more > reliable. Huch? stop_machine() is horrible and heavy weight. Don't go there, there must be simpler solutions than that. Thanks, tglx
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Tue 07-02-17 12:03:19, Tejun Heo wrote: > Hello, > > Sorry about the delay. > > On Tue, Feb 07, 2017 at 04:34:59PM +0100, Michal Hocko wrote: > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index c3358d4f7932..b6411816787a 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -2343,7 +2343,16 @@ void drain_local_pages(struct zone *zone) > > > > static void drain_local_pages_wq(struct work_struct *work) > > { > > + /* > > +* drain_all_pages doesn't use proper cpu hotplug protection so > > +* we can race with cpu offline when the WQ can move this from > > +* a cpu pinned worker to an unbound one. We can operate on a different > > +* cpu which is allright but we also have to make sure to not move to > > +* a different one. > > +*/ > > + preempt_disable(); > > drain_local_pages(NULL); > > + preempt_enable(); > > } > > > > /* > > @@ -2379,12 +2388,6 @@ void drain_all_pages(struct zone *zone) > > } > > > > /* > > -* As this can be called from reclaim context, do not reenter reclaim. > > -* An allocation failure can be handled, it's simply slower > > -*/ > > - get_online_cpus(); > > - > > - /* > > * We don't care about racing with CPU hotplug event > > * as offline notification will cause the notified > > * cpu to drain that CPU pcps and on_each_cpu_mask > > @@ -2423,7 +2426,6 @@ void drain_all_pages(struct zone *zone) > > for_each_cpu(cpu, &cpus_with_pcps) > > flush_work(per_cpu_ptr(&pcpu_drain, cpu)); > > > > - put_online_cpus(); > > mutex_unlock(&pcpu_drain_mutex); > > I think this would work; however, a more canonical way would be > something along the line of... > > drain_all_pages() > { > ... > spin_lock(); > for_each_possible_cpu() { > if (this cpu should get drained) { > queue_work_on(this cpu's work); > } > } > spin_unlock(); > ... > } > > offline_hook() > { > spin_lock(); > this cpu should get drained = false; > spin_unlock(); > queue_work_on(this cpu's work); > flush_work(this cpu's work); > } I see > I think what workqueue should do is automatically flush in-flight CPU > work items on CPU offline and erroring out on queue_work_on() on > offline CPUs. And we now actually can do that because we have lifted > the guarantee that queue_work() is local CPU affine some releases ago. > I'll look into it soonish. > > For the time being, either approach should be fine. The more > canonical one might be a bit less surprising but the > preempt_disable/enable() change is short and sweet and completely fine > for the case at hand. Thanks for double checking! > Please feel free to add > > Acked-by: Tejun Heo Thanks! -- Michal Hocko SUSE Labs
Re: mm: deadlock between get_online_cpus/pcpu_alloc
Hello, Sorry about the delay. On Tue, Feb 07, 2017 at 04:34:59PM +0100, Michal Hocko wrote: > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index c3358d4f7932..b6411816787a 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2343,7 +2343,16 @@ void drain_local_pages(struct zone *zone) > > static void drain_local_pages_wq(struct work_struct *work) > { > + /* > + * drain_all_pages doesn't use proper cpu hotplug protection so > + * we can race with cpu offline when the WQ can move this from > + * a cpu pinned worker to an unbound one. We can operate on a different > + * cpu which is allright but we also have to make sure to not move to > + * a different one. > + */ > + preempt_disable(); > drain_local_pages(NULL); > + preempt_enable(); > } > > /* > @@ -2379,12 +2388,6 @@ void drain_all_pages(struct zone *zone) > } > > /* > - * As this can be called from reclaim context, do not reenter reclaim. > - * An allocation failure can be handled, it's simply slower > - */ > - get_online_cpus(); > - > - /* >* We don't care about racing with CPU hotplug event >* as offline notification will cause the notified >* cpu to drain that CPU pcps and on_each_cpu_mask > @@ -2423,7 +2426,6 @@ void drain_all_pages(struct zone *zone) > for_each_cpu(cpu, &cpus_with_pcps) > flush_work(per_cpu_ptr(&pcpu_drain, cpu)); > > - put_online_cpus(); > mutex_unlock(&pcpu_drain_mutex); I think this would work; however, a more canonical way would be something along the line of... drain_all_pages() { ... spin_lock(); for_each_possible_cpu() { if (this cpu should get drained) { queue_work_on(this cpu's work); } } spin_unlock(); ... } offline_hook() { spin_lock(); this cpu should get drained = false; spin_unlock(); queue_work_on(this cpu's work); flush_work(this cpu's work); } I think what workqueue should do is automatically flush in-flight CPU work items on CPU offline and erroring out on queue_work_on() on offline CPUs. And we now actually can do that because we have lifted the guarantee that queue_work() is local CPU affine some releases ago. I'll look into it soonish. For the time being, either approach should be fine. The more canonical one might be a bit less surprising but the preempt_disable/enable() change is short and sweet and completely fine for the case at hand. Please feel free to add Acked-by: Tejun Heo Thanks. -- tejun
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Tue, 7 Feb 2017, Michal Hocko wrote: > I am always nervous when seeing hotplug locks being used in low level > code. It has bitten us several times already and those deadlocks are > quite hard to spot when reviewing the code and very rare to hit so they > tend to live for a long time. Yep. Hotplug events are pretty significant. Using stop_machine_() etc would be advisable and that would avoid the taking of locks and get rid of all the ocmplexity, reduce the code size and make the overall system much more reliable. Thomas?
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Tue 07-02-17 16:22:24, Mel Gorman wrote: > On Tue, Feb 07, 2017 at 04:34:59PM +0100, Michal Hocko wrote: > > > But we do not care about the whole cpu hotplug code. The only part we > > > really do care about is the race inside drain_pages_zone and that will > > > run in an atomic context on the specific CPU. > > > > > > You are absolutely right that using the mutex is safe as well but the > > > hotplug path is already littered with locks and adding one more to the > > > picture doesn't sound great to me. So I would really like to not use a > > > lock if that is possible and safe (with a big fat comment of course). > > > > And with the full changelog. I hope I haven't missed anything this time. > > --- > > From 8c6af3116520251cc4ec2213f0a4ed2544bb4365 Mon Sep 17 00:00:00 2001 > > From: Michal Hocko > > Date: Tue, 7 Feb 2017 16:08:35 +0100 > > Subject: [PATCH] mm, page_alloc: do not depend on cpu hotplug locks inside > > the > > allocator > > > > > > > > Reported-by: Dmitry Vyukov > > Signed-off-by: Michal Hocko > > Not that I can think of. It's almost identical to the diff I posted with > the exception of the mutex in the cpu hotplug teardown path. I agree that > in the current implementation that it should be unnecessary even if I > thought it would be more robust against any other hotplug churn. I am always nervous when seeing hotplug locks being used in low level code. It has bitten us several times already and those deadlocks are quite hard to spot when reviewing the code and very rare to hit so they tend to live for a long time. > Acked-by: Mel Gorman Thanks! I will wait for Tejun to confirm my assumptions are correct and post the patch to Andrew if there are no further problems spotted. Btw. this will also get rid of another lockdep report which seem to be false possitive though http://lkml.kernel.org/r/20170203145548.gc19...@dhcp22.suse.cz -- Michal Hocko SUSE Labs
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Tue, Feb 07, 2017 at 04:34:59PM +0100, Michal Hocko wrote: > > But we do not care about the whole cpu hotplug code. The only part we > > really do care about is the race inside drain_pages_zone and that will > > run in an atomic context on the specific CPU. > > > > You are absolutely right that using the mutex is safe as well but the > > hotplug path is already littered with locks and adding one more to the > > picture doesn't sound great to me. So I would really like to not use a > > lock if that is possible and safe (with a big fat comment of course). > > And with the full changelog. I hope I haven't missed anything this time. > --- > From 8c6af3116520251cc4ec2213f0a4ed2544bb4365 Mon Sep 17 00:00:00 2001 > From: Michal Hocko > Date: Tue, 7 Feb 2017 16:08:35 +0100 > Subject: [PATCH] mm, page_alloc: do not depend on cpu hotplug locks inside the > allocator > > > > Reported-by: Dmitry Vyukov > Signed-off-by: Michal Hocko Not that I can think of. It's almost identical to the diff I posted with the exception of the mutex in the cpu hotplug teardown path. I agree that in the current implementation that it should be unnecessary even if I thought it would be more robust against any other hotplug churn. Acked-by: Mel Gorman -- Mel Gorman SUSE Labs
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Tue 07-02-17 15:19:11, Michal Hocko wrote: > On Tue 07-02-17 13:58:46, Mel Gorman wrote: > > On Tue, Feb 07, 2017 at 01:37:08PM +0100, Michal Hocko wrote: > [...] > > > Anyway, shouldn't be it sufficient to disable preemption > > > on drain_local_pages_wq? > > > > That would be sufficient for a hot-removed CPU moving the drain request > > to another CPU and avoiding any scheduling events. > > > > > The CPU hotplug callback will not preempt us > > > and so we cannot work on the same cpus, right? > > > > > > > I don't see a specific guarantee that it cannot be preempted and it > > would depend on an the exact cpu hotplug implementation which is subject > > to quite a lot of change. > > But we do not care about the whole cpu hotplug code. The only part we > really do care about is the race inside drain_pages_zone and that will > run in an atomic context on the specific CPU. > > You are absolutely right that using the mutex is safe as well but the > hotplug path is already littered with locks and adding one more to the > picture doesn't sound great to me. So I would really like to not use a > lock if that is possible and safe (with a big fat comment of course). And with the full changelog. I hope I haven't missed anything this time. --- >From 8c6af3116520251cc4ec2213f0a4ed2544bb4365 Mon Sep 17 00:00:00 2001 From: Michal Hocko Date: Tue, 7 Feb 2017 16:08:35 +0100 Subject: [PATCH] mm, page_alloc: do not depend on cpu hotplug locks inside the allocator Dmitry has reported the following lockdep splat [] lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753 [] __mutex_lock_common kernel/locking/mutex.c:521 [inline] [] mutex_lock_nested+0x24e/0xff0 kernel/locking/mutex.c:621 [] pcpu_alloc+0xbda/0x1280 mm/percpu.c:896 [] __alloc_percpu+0x24/0x30 mm/percpu.c:1075 [] smpcfd_prepare_cpu+0x73/0xd0 kernel/smp.c:44 [] cpuhp_invoke_callback+0x254/0x1480 kernel/cpu.c:136 [] cpuhp_up_callbacks+0x81/0x2a0 kernel/cpu.c:493 [] _cpu_up+0x1e3/0x2a0 kernel/cpu.c:1057 [] do_cpu_up+0x73/0xa0 kernel/cpu.c:1087 [] cpu_up+0x18/0x20 kernel/cpu.c:1095 [] smp_init+0xe9/0xee kernel/smp.c:564 [] kernel_init_freeable+0x439/0x690 init/main.c:1010 [] kernel_init+0x13/0x180 init/main.c:941 [] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433 cpu_hotplug_begin cpu_hotplug.lock pcpu_alloc pcpu_alloc_mutex [] get_online_cpus+0x62/0x90 kernel/cpu.c:248 [] drain_all_pages+0xf8/0x710 mm/page_alloc.c:2385 [] __alloc_pages_direct_reclaim mm/page_alloc.c:3440 [inline] [] __alloc_pages_slowpath+0x8fd/0x2370 mm/page_alloc.c:3778 [] __alloc_pages_nodemask+0x8f5/0xc60 mm/page_alloc.c:3980 [] __alloc_pages include/linux/gfp.h:426 [inline] [] __alloc_pages_node include/linux/gfp.h:439 [inline] [] alloc_pages_node include/linux/gfp.h:453 [inline] [] pcpu_alloc_pages mm/percpu-vm.c:93 [inline] [] pcpu_populate_chunk+0x1e1/0x900 mm/percpu-vm.c:282 [] pcpu_alloc+0xe01/0x1280 mm/percpu.c:998 [] __alloc_percpu_gfp+0x27/0x30 mm/percpu.c:1062 [] bpf_array_alloc_percpu kernel/bpf/arraymap.c:34 [inline] [] array_map_alloc+0x532/0x710 kernel/bpf/arraymap.c:99 [] find_and_alloc_map kernel/bpf/syscall.c:34 [inline] [] map_create kernel/bpf/syscall.c:188 [inline] [] SYSC_bpf kernel/bpf/syscall.c:870 [inline] [] SyS_bpf+0xd64/0x2500 kernel/bpf/syscall.c:827 [] entry_SYSCALL_64_fastpath+0x1f/0xc2 pcpu_alloc pcpu_alloc_mutex drain_all_pages get_online_cpus cpu_hotplug.lock [] cpu_hotplug_begin+0x206/0x2e0 kernel/cpu.c:304 [] _cpu_up+0xca/0x2a0 kernel/cpu.c:1011 [] do_cpu_up+0x73/0xa0 kernel/cpu.c:1087 [] cpu_up+0x18/0x20 kernel/cpu.c:1095 [] smp_init+0xe9/0xee kernel/smp.c:564 [] kernel_init_freeable+0x439/0x690 init/main.c:1010 [] kernel_init+0x13/0x180 init/main.c:941 [] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433 cpu_hotplug_begin cpu_hotplug.lock Pulling cpu hotplug locks inside the page allocator is just too dangerous. Let's remove the dependency by dropping get_online_cpus() from drain_all_pages. This is not so simple though because now we do not have a protection against cpu hotplug which means 2 things: - the work item might be executed on a different cpu in worker from unbound pool so it doesn't run on pinned on the cpu - we have to make sure that we do not race with page_alloc_cpu_dead calling drain_pages_zone Disabling preemption in drain_local_pages_wq will solve the first problem drain_local_pages will determine its local CPU from the WQ context which will be stable after that point, page_alloc_cpu_dead is pinned to the CPU already. The later condition is achieved by disabling IRQs in drain_pages_zone. Reported-by: Dmitry Vyukov Signed-off-by: Michal Hocko --- mm/page_alloc.c | 16 +--- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c3358d4f7932..b6411816787a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2343,7 +2343,16 @@ void drain_local_pages(struct zone *zone) static void drain_local_pages_wq(st
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Tue 07-02-17 13:58:46, Mel Gorman wrote: > On Tue, Feb 07, 2017 at 01:37:08PM +0100, Michal Hocko wrote: [...] > > Anyway, shouldn't be it sufficient to disable preemption > > on drain_local_pages_wq? > > That would be sufficient for a hot-removed CPU moving the drain request > to another CPU and avoiding any scheduling events. > > > The CPU hotplug callback will not preempt us > > and so we cannot work on the same cpus, right? > > > > I don't see a specific guarantee that it cannot be preempted and it > would depend on an the exact cpu hotplug implementation which is subject > to quite a lot of change. But we do not care about the whole cpu hotplug code. The only part we really do care about is the race inside drain_pages_zone and that will run in an atomic context on the specific CPU. You are absolutely right that using the mutex is safe as well but the hotplug path is already littered with locks and adding one more to the picture doesn't sound great to me. So I would really like to not use a lock if that is possible and safe (with a big fat comment of course). -- Michal Hocko SUSE Labs
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Tue, Feb 07, 2017 at 01:37:08PM +0100, Michal Hocko wrote: > > You cannot put sleepable lock inside the preempt disbaled section... > > We can make it a spinlock right? > > Scratch that! For some reason I thought that cpu notifiers are run in an > atomic context. Now that I am checking the code again it turns out I was > wrong. __cpu_notify uses __raw_notifier_call_chain so this is not an > atomic context. Indeed. > Anyway, shouldn't be it sufficient to disable preemption > on drain_local_pages_wq? That would be sufficient for a hot-removed CPU moving the drain request to another CPU and avoiding any scheduling events. > The CPU hotplug callback will not preempt us > and so we cannot work on the same cpus, right? > I don't see a specific guarantee that it cannot be preempted and it would depend on an the exact cpu hotplug implementation which is subject to quite a lot of change. Hence, the mutex provides a guantee that the hot-removed CPU teardown cannot run on the same CPU as a workqueue drain running on a CPU it was not originally scheduled for. -- Mel Gorman SUSE Labs
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On 02/07/2017 01:48 PM, Michal Hocko wrote: On Tue 07-02-17 13:43:39, Vlastimil Babka wrote: [...] > Anyway, shouldn't be it sufficient to disable preemption > on drain_local_pages_wq? The CPU hotplug callback will not preempt us > and so we cannot work on the same cpus, right? I thought the problem here was that the callback races with the work item that has been migrated to a different cpu. Once we are not working on the local cpu, disabling preempt/irq's won't help? If the worker is racing with the callback than only one of can run on a _particular_ cpu. So they cannot race. Or am I missing something? Ah I forgot that migrated work item will in fact run on local cpu. So looks like nobody should race with the callback indeed (assuming that when the callback is called, the cpu in question already isn't executing workqueue workers).
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Tue 07-02-17 13:03:50, Mel Gorman wrote: > On Tue, Feb 07, 2017 at 12:43:27PM +0100, Michal Hocko wrote: > > > Right. The unbind operation can set a mask that is any allowable CPU and > > > the final process_work is not done in a context that prevents > > > preemption. > > > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > index 3b93879990fd..7af165d308c4 100644 > > > --- a/mm/page_alloc.c > > > +++ b/mm/page_alloc.c > > > @@ -2342,7 +2342,14 @@ void drain_local_pages(struct zone *zone) > > > > > > static void drain_local_pages_wq(struct work_struct *work) > > > { > > > + /* > > > + * Ordinarily a drain operation is bound to a CPU but may be unbound > > > + * after a CPU hotplug operation so it's necessary to disable > > > + * preemption for the drain to stabilise the CPU ID. > > > + */ > > > + preempt_disable(); > > > drain_local_pages(NULL); > > > + preempt_enable_no_resched(); > > > } > > > > > > /* > > [...] > > > @@ -6711,7 +6714,16 @@ static int page_alloc_cpu_dead(unsigned int cpu) > > > { > > > > > > lru_add_drain_cpu(cpu); > > > + > > > + /* > > > + * A per-cpu drain via a workqueue from drain_all_pages can be > > > + * rescheduled onto an unrelated CPU. That allows the hotplug > > > + * operation and the drain to potentially race on the same > > > + * CPU. Serialise hotplug versus drain using pcpu_drain_mutex > > > + */ > > > + mutex_lock(&pcpu_drain_mutex); > > > drain_pages(cpu); > > > + mutex_unlock(&pcpu_drain_mutex); > > > > You cannot put sleepable lock inside the preempt disbaled section... > > We can make it a spinlock right? > > > > The CPU down callback can hold a mutex and at least he SLUB callback > already does so. That gives > > page_alloc_cpu_dead > mutex_lock > drain_pages > mutex_unlock > > drain_all_pages > mutex_lock > queue workqueue > drain_local_pages_wq > preempt_disable > drain_local_pages > drain_pages > preempt_enable >flush queues > mutex_unlock > > I must be blind or maybe it's rushing between multiple concerns but which > sleepable lock is of concern? I thought the cpu hotplug callback was non-preemptible. This is not the case as mentioned in other reply. The pcpu_drain_mutex in the hotplug callback is alright. Sorry about the confusion! I am still wondering whether the lock is really needed. See the other reply. -- Michal Hocko SUSE Labs
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Tue, Feb 07, 2017 at 12:43:27PM +0100, Michal Hocko wrote: > > Right. The unbind operation can set a mask that is any allowable CPU and > > the final process_work is not done in a context that prevents > > preemption. > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 3b93879990fd..7af165d308c4 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -2342,7 +2342,14 @@ void drain_local_pages(struct zone *zone) > > > > static void drain_local_pages_wq(struct work_struct *work) > > { > > + /* > > +* Ordinarily a drain operation is bound to a CPU but may be unbound > > +* after a CPU hotplug operation so it's necessary to disable > > +* preemption for the drain to stabilise the CPU ID. > > +*/ > > + preempt_disable(); > > drain_local_pages(NULL); > > + preempt_enable_no_resched(); > > } > > > > /* > [...] > > @@ -6711,7 +6714,16 @@ static int page_alloc_cpu_dead(unsigned int cpu) > > { > > > > lru_add_drain_cpu(cpu); > > + > > + /* > > +* A per-cpu drain via a workqueue from drain_all_pages can be > > +* rescheduled onto an unrelated CPU. That allows the hotplug > > +* operation and the drain to potentially race on the same > > +* CPU. Serialise hotplug versus drain using pcpu_drain_mutex > > +*/ > > + mutex_lock(&pcpu_drain_mutex); > > drain_pages(cpu); > > + mutex_unlock(&pcpu_drain_mutex); > > You cannot put sleepable lock inside the preempt disbaled section... > We can make it a spinlock right? > The CPU down callback can hold a mutex and at least he SLUB callback already does so. That gives page_alloc_cpu_dead mutex_lock drain_pages mutex_unlock drain_all_pages mutex_lock queue workqueue drain_local_pages_wq preempt_disable drain_local_pages drain_pages preempt_enable flush queues mutex_unlock I must be blind or maybe it's rushing between multiple concerns but which sleepable lock is of concern? -- Mel Gorman SUSE Labs
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Tue 07-02-17 13:43:39, Vlastimil Babka wrote: [...] > > Anyway, shouldn't be it sufficient to disable preemption > > on drain_local_pages_wq? The CPU hotplug callback will not preempt us > > and so we cannot work on the same cpus, right? > > I thought the problem here was that the callback races with the work item > that has been migrated to a different cpu. Once we are not working on the > local cpu, disabling preempt/irq's won't help? If the worker is racing with the callback than only one of can run on a _particular_ cpu. So they cannot race. Or am I missing something? -- Michal Hocko SUSE Labs
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On 02/07/2017 01:37 PM, Michal Hocko wrote: > @@ -6711,7 +6714,16 @@ static int page_alloc_cpu_dead(unsigned int cpu) > { > >lru_add_drain_cpu(cpu); > + > + /* > + * A per-cpu drain via a workqueue from drain_all_pages can be > + * rescheduled onto an unrelated CPU. That allows the hotplug > + * operation and the drain to potentially race on the same > + * CPU. Serialise hotplug versus drain using pcpu_drain_mutex > + */ > + mutex_lock(&pcpu_drain_mutex); >drain_pages(cpu); > + mutex_unlock(&pcpu_drain_mutex); You cannot put sleepable lock inside the preempt disbaled section... We can make it a spinlock right? Scratch that! For some reason I thought that cpu notifiers are run in an atomic context. Now that I am checking the code again it turns out I was wrong. __cpu_notify uses __raw_notifier_call_chain so this is not an atomic context. Good. Anyway, shouldn't be it sufficient to disable preemption on drain_local_pages_wq? The CPU hotplug callback will not preempt us and so we cannot work on the same cpus, right? I thought the problem here was that the callback races with the work item that has been migrated to a different cpu. Once we are not working on the local cpu, disabling preempt/irq's won't help?
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Tue 07-02-17 12:43:27, Michal Hocko wrote: > On Tue 07-02-17 11:34:35, Mel Gorman wrote: > > On Tue, Feb 07, 2017 at 11:35:52AM +0100, Michal Hocko wrote: > > > On Tue 07-02-17 10:28:09, Mel Gorman wrote: > > > > On Tue, Feb 07, 2017 at 10:49:28AM +0100, Vlastimil Babka wrote: > > > > > On 02/07/2017 10:43 AM, Mel Gorman wrote: > > > > > > If I'm reading this right, a hot-remove will set the pool > > > > > > POOL_DISASSOCIATED > > > > > > and unbound. A workqueue queued for draining get migrated during > > > > > > hot-remove > > > > > > and a drain operation will execute twice on a CPU -- one for what > > > > > > was > > > > > > queued and a second time for the CPU it was migrated from. It > > > > > > should still > > > > > > work with flush_work which doesn't appear to block forever if an > > > > > > item > > > > > > got migrated to another workqueue. The actual drain workqueue > > > > > > function is > > > > > > using the CPU ID it's currently running on so it shouldn't get > > > > > > confused. > > > > > > > > > > Is the worker that will process this migrated workqueue also > > > > > guaranteed > > > > > to be pinned to a cpu for the whole work, though? drain_local_pages() > > > > > needs that guarantee. > > > > > > > > > > > > > It should be by running on a workqueue handler bound to that CPU (queued > > > > on wq->cpu_pwqs in __queue_work) > > > > > > Are you sure? The comment in kernel/workqueue.c says > > > * While DISASSOCIATED, the cpu may be offline and all workers > > > have > > > * %WORKER_UNBOUND set and concurrency management disabled, and > > > may > > > * be executing on any CPU. The pool behaves as an unbound one. > > > > > > I might be misreadig but an unbound pool can be handled by workers which > > > are not pinned on any cpu AFAIU. > > > > Right. The unbind operation can set a mask that is any allowable CPU and > > the final process_work is not done in a context that prevents > > preemption. > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index 3b93879990fd..7af165d308c4 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -2342,7 +2342,14 @@ void drain_local_pages(struct zone *zone) > > > > static void drain_local_pages_wq(struct work_struct *work) > > { > > + /* > > +* Ordinarily a drain operation is bound to a CPU but may be unbound > > +* after a CPU hotplug operation so it's necessary to disable > > +* preemption for the drain to stabilise the CPU ID. > > +*/ > > + preempt_disable(); > > drain_local_pages(NULL); > > + preempt_enable_no_resched(); > > } > > > > /* > [...] > > @@ -6711,7 +6714,16 @@ static int page_alloc_cpu_dead(unsigned int cpu) > > { > > > > lru_add_drain_cpu(cpu); > > + > > + /* > > +* A per-cpu drain via a workqueue from drain_all_pages can be > > +* rescheduled onto an unrelated CPU. That allows the hotplug > > +* operation and the drain to potentially race on the same > > +* CPU. Serialise hotplug versus drain using pcpu_drain_mutex > > +*/ > > + mutex_lock(&pcpu_drain_mutex); > > drain_pages(cpu); > > + mutex_unlock(&pcpu_drain_mutex); > > You cannot put sleepable lock inside the preempt disbaled section... > We can make it a spinlock right? Scratch that! For some reason I thought that cpu notifiers are run in an atomic context. Now that I am checking the code again it turns out I was wrong. __cpu_notify uses __raw_notifier_call_chain so this is not an atomic context. Anyway, shouldn't be it sufficient to disable preemption on drain_local_pages_wq? The CPU hotplug callback will not preempt us and so we cannot work on the same cpus, right? -- Michal Hocko SUSE Labs
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Tue 07-02-17 12:54:48, Vlastimil Babka wrote: > On 02/07/2017 12:43 PM, Michal Hocko wrote: > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > > index 3b93879990fd..7af165d308c4 100644 > > > --- a/mm/page_alloc.c > > > +++ b/mm/page_alloc.c > > > @@ -2342,7 +2342,14 @@ void drain_local_pages(struct zone *zone) > > > > > > static void drain_local_pages_wq(struct work_struct *work) > > > { > > > + /* > > > + * Ordinarily a drain operation is bound to a CPU but may be unbound > > > + * after a CPU hotplug operation so it's necessary to disable > > > + * preemption for the drain to stabilise the CPU ID. > > > + */ > > > + preempt_disable(); > > > drain_local_pages(NULL); > > > + preempt_enable_no_resched(); > > > } > > > > > > /* > > [...] > > > @@ -6711,7 +6714,16 @@ static int page_alloc_cpu_dead(unsigned int cpu) > > > { > > > > > > lru_add_drain_cpu(cpu); > > > + > > > + /* > > > + * A per-cpu drain via a workqueue from drain_all_pages can be > > > + * rescheduled onto an unrelated CPU. That allows the hotplug > > > + * operation and the drain to potentially race on the same > > > + * CPU. Serialise hotplug versus drain using pcpu_drain_mutex > > > + */ > > > + mutex_lock(&pcpu_drain_mutex); > > > drain_pages(cpu); > > > + mutex_unlock(&pcpu_drain_mutex); > > > > You cannot put sleepable lock inside the preempt disbaled section... > > We can make it a spinlock right? > > Could we do flush_work() with a spinlock? Sounds bad too. We surely cannot. I thought the lock would be gone in drain_all_pages, we would deadlock with the lock there anyway... But it is true that we would need a way to only allow one caller to get in. This is getting messier and messier... > Maybe we could just use the fact that the whole drain happens with disabled > irq's and obtain the current cpu under that protection? preempt_disable should be enough, no? The CPU callback is not called from an IRQ context, right? --- diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1ee49474207e..4a9a65479435 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2343,7 +2343,16 @@ void drain_local_pages(struct zone *zone) static void drain_local_pages_wq(struct work_struct *work) { + /* +* drain_all_pages doesn't use proper cpu hotplug protection so +* we can race with cpu offline when the WQ can move this from +* a cpu pinned worker to an unbound one. We can operate on a different +* cpu which is allright but we also have to make sure to not move to +* a different one. +*/ + preempt_disable(); drain_local_pages(NULL); + preempt_enable(); } /* @@ -2379,12 +2388,6 @@ void drain_all_pages(struct zone *zone) } /* -* As this can be called from reclaim context, do not reenter reclaim. -* An allocation failure can be handled, it's simply slower -*/ - get_online_cpus(); - - /* * We don't care about racing with CPU hotplug event * as offline notification will cause the notified * cpu to drain that CPU pcps and on_each_cpu_mask @@ -2423,7 +2426,6 @@ void drain_all_pages(struct zone *zone) for_each_cpu(cpu, &cpus_with_pcps) flush_work(per_cpu_ptr(&pcpu_drain, cpu)); - put_online_cpus(); mutex_unlock(&pcpu_drain_mutex); } -- Michal Hocko SUSE Labs
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On 02/07/2017 12:43 PM, Michal Hocko wrote: diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3b93879990fd..7af165d308c4 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2342,7 +2342,14 @@ void drain_local_pages(struct zone *zone) static void drain_local_pages_wq(struct work_struct *work) { + /* +* Ordinarily a drain operation is bound to a CPU but may be unbound +* after a CPU hotplug operation so it's necessary to disable +* preemption for the drain to stabilise the CPU ID. +*/ + preempt_disable(); drain_local_pages(NULL); + preempt_enable_no_resched(); } /* [...] @@ -6711,7 +6714,16 @@ static int page_alloc_cpu_dead(unsigned int cpu) { lru_add_drain_cpu(cpu); + + /* +* A per-cpu drain via a workqueue from drain_all_pages can be +* rescheduled onto an unrelated CPU. That allows the hotplug +* operation and the drain to potentially race on the same +* CPU. Serialise hotplug versus drain using pcpu_drain_mutex +*/ + mutex_lock(&pcpu_drain_mutex); drain_pages(cpu); + mutex_unlock(&pcpu_drain_mutex); You cannot put sleepable lock inside the preempt disbaled section... We can make it a spinlock right? Could we do flush_work() with a spinlock? Sounds bad too. Maybe we could just use the fact that the whole drain happens with disabled irq's and obtain the current cpu under that protection?
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Tue 07-02-17 11:34:35, Mel Gorman wrote: > On Tue, Feb 07, 2017 at 11:35:52AM +0100, Michal Hocko wrote: > > On Tue 07-02-17 10:28:09, Mel Gorman wrote: > > > On Tue, Feb 07, 2017 at 10:49:28AM +0100, Vlastimil Babka wrote: > > > > On 02/07/2017 10:43 AM, Mel Gorman wrote: > > > > > If I'm reading this right, a hot-remove will set the pool > > > > > POOL_DISASSOCIATED > > > > > and unbound. A workqueue queued for draining get migrated during > > > > > hot-remove > > > > > and a drain operation will execute twice on a CPU -- one for what was > > > > > queued and a second time for the CPU it was migrated from. It should > > > > > still > > > > > work with flush_work which doesn't appear to block forever if an item > > > > > got migrated to another workqueue. The actual drain workqueue > > > > > function is > > > > > using the CPU ID it's currently running on so it shouldn't get > > > > > confused. > > > > > > > > Is the worker that will process this migrated workqueue also guaranteed > > > > to be pinned to a cpu for the whole work, though? drain_local_pages() > > > > needs that guarantee. > > > > > > > > > > It should be by running on a workqueue handler bound to that CPU (queued > > > on wq->cpu_pwqs in __queue_work) > > > > Are you sure? The comment in kernel/workqueue.c says > > * While DISASSOCIATED, the cpu may be offline and all workers have > > * %WORKER_UNBOUND set and concurrency management disabled, and may > > * be executing on any CPU. The pool behaves as an unbound one. > > > > I might be misreadig but an unbound pool can be handled by workers which > > are not pinned on any cpu AFAIU. > > Right. The unbind operation can set a mask that is any allowable CPU and > the final process_work is not done in a context that prevents > preemption. > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 3b93879990fd..7af165d308c4 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2342,7 +2342,14 @@ void drain_local_pages(struct zone *zone) > > static void drain_local_pages_wq(struct work_struct *work) > { > + /* > + * Ordinarily a drain operation is bound to a CPU but may be unbound > + * after a CPU hotplug operation so it's necessary to disable > + * preemption for the drain to stabilise the CPU ID. > + */ > + preempt_disable(); > drain_local_pages(NULL); > + preempt_enable_no_resched(); > } > > /* [...] > @@ -6711,7 +6714,16 @@ static int page_alloc_cpu_dead(unsigned int cpu) > { > > lru_add_drain_cpu(cpu); > + > + /* > + * A per-cpu drain via a workqueue from drain_all_pages can be > + * rescheduled onto an unrelated CPU. That allows the hotplug > + * operation and the drain to potentially race on the same > + * CPU. Serialise hotplug versus drain using pcpu_drain_mutex > + */ > + mutex_lock(&pcpu_drain_mutex); > drain_pages(cpu); > + mutex_unlock(&pcpu_drain_mutex); You cannot put sleepable lock inside the preempt disbaled section... We can make it a spinlock right? -- Michal Hocko SUSE Labs
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Tue, Feb 07, 2017 at 11:35:52AM +0100, Michal Hocko wrote: > On Tue 07-02-17 10:28:09, Mel Gorman wrote: > > On Tue, Feb 07, 2017 at 10:49:28AM +0100, Vlastimil Babka wrote: > > > On 02/07/2017 10:43 AM, Mel Gorman wrote: > > > > If I'm reading this right, a hot-remove will set the pool > > > > POOL_DISASSOCIATED > > > > and unbound. A workqueue queued for draining get migrated during > > > > hot-remove > > > > and a drain operation will execute twice on a CPU -- one for what was > > > > queued and a second time for the CPU it was migrated from. It should > > > > still > > > > work with flush_work which doesn't appear to block forever if an item > > > > got migrated to another workqueue. The actual drain workqueue function > > > > is > > > > using the CPU ID it's currently running on so it shouldn't get confused. > > > > > > Is the worker that will process this migrated workqueue also guaranteed > > > to be pinned to a cpu for the whole work, though? drain_local_pages() > > > needs that guarantee. > > > > > > > It should be by running on a workqueue handler bound to that CPU (queued > > on wq->cpu_pwqs in __queue_work) > > Are you sure? The comment in kernel/workqueue.c says > * While DISASSOCIATED, the cpu may be offline and all workers have > * %WORKER_UNBOUND set and concurrency management disabled, and may > * be executing on any CPU. The pool behaves as an unbound one. > > I might be misreadig but an unbound pool can be handled by workers which > are not pinned on any cpu AFAIU. Right. The unbind operation can set a mask that is any allowable CPU and the final process_work is not done in a context that prevents preemption. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3b93879990fd..7af165d308c4 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2342,7 +2342,14 @@ void drain_local_pages(struct zone *zone) static void drain_local_pages_wq(struct work_struct *work) { + /* +* Ordinarily a drain operation is bound to a CPU but may be unbound +* after a CPU hotplug operation so it's necessary to disable +* preemption for the drain to stabilise the CPU ID. +*/ + preempt_disable(); drain_local_pages(NULL); + preempt_enable_no_resched(); } /* @@ -2377,13 +2384,10 @@ void drain_all_pages(struct zone *zone) mutex_lock(&pcpu_drain_mutex); } - get_online_cpus(); - /* -* We don't care about racing with CPU hotplug event -* as offline notification will cause the notified -* cpu to drain that CPU pcps and on_each_cpu_mask -* disables preemption as part of its processing +* We don't care about racing with CPU hotplug event as offline +* notification will cause the notified cpu to drain that CPU pcps +* and it is serialised against here via pcpu_drain_mutex. */ for_each_online_cpu(cpu) { struct per_cpu_pageset *pcp; @@ -2418,7 +2422,6 @@ void drain_all_pages(struct zone *zone) for_each_cpu(cpu, &cpus_with_pcps) flush_work(per_cpu_ptr(&pcpu_drain, cpu)); - put_online_cpus(); mutex_unlock(&pcpu_drain_mutex); } @@ -6711,7 +6714,16 @@ static int page_alloc_cpu_dead(unsigned int cpu) { lru_add_drain_cpu(cpu); + + /* +* A per-cpu drain via a workqueue from drain_all_pages can be +* rescheduled onto an unrelated CPU. That allows the hotplug +* operation and the drain to potentially race on the same +* CPU. Serialise hotplug versus drain using pcpu_drain_mutex +*/ + mutex_lock(&pcpu_drain_mutex); drain_pages(cpu); + mutex_unlock(&pcpu_drain_mutex); /* * Spill the event counters of the dead processor -- Mel Gorman SUSE Labs
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On 2017/02/07 7:05, Mel Gorman wrote: > On Mon, Feb 06, 2017 at 08:13:35PM +0100, Dmitry Vyukov wrote: >> On Mon, Jan 30, 2017 at 4:48 PM, Dmitry Vyukov wrote: >>> On Sun, Jan 29, 2017 at 6:22 PM, Vlastimil Babka wrote: On 29.1.2017 13:44, Dmitry Vyukov wrote: > Hello, > > I've got the following deadlock report while running syzkaller fuzzer > on f37208bc3c9c2f811460ef264909dfbc7f605a60: > > [ INFO: possible circular locking dependency detected ] > 4.10.0-rc5-next-20170125 #1 Not tainted > --- > syz-executor3/14255 is trying to acquire lock: > (cpu_hotplug.dep_map){++}, at: [] > get_online_cpus+0x37/0x90 kernel/cpu.c:239 > > but task is already holding lock: > (pcpu_alloc_mutex){+.+.+.}, at: [] > pcpu_alloc+0xbfe/0x1290 mm/percpu.c:897 > > which lock already depends on the new lock. I suspect the dependency comes from recent changes in drain_all_pages(). They were later redone (for other reasons, but nice to have another validation) in the mmots patch [1], which AFAICS is not yet in mmotm and thus linux-next. Could you try if it helps? >>> >>> It happened only once on linux-next, so I can't verify the fix. But I >>> will watch out for other occurrences. >> >> Unfortunately it does not seem to help. > > I'm a little stuck on how to best handle this. get_online_cpus() can > halt forever if the hotplug operation is holding the mutex when calling > pcpu_alloc. One option would be to add a try_get_online_cpus() helper which > trylocks the mutex. However, given that drain is so unlikely to actually > make that make a difference when racing against parallel allocations, > I think this should be acceptable. > > Any objections? Why below change on top of current linux.git (8b1b41ee74f9) insufficient? I think it can eliminate IPIs a lot without introducing lockdep warnings. -- diff --git a/mm/page_alloc.c b/mm/page_alloc.c index f3e0c69..ae6e7aa 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2354,6 +2354,7 @@ void drain_local_pages(struct zone *zone) */ void drain_all_pages(struct zone *zone) { + static DEFINE_MUTEX(lock); int cpu; /* @@ -2362,6 +2363,7 @@ void drain_all_pages(struct zone *zone) */ static cpumask_t cpus_with_pcps; + mutex_lock(&lock); /* * We don't care about racing with CPU hotplug event * as offline notification will cause the notified @@ -2394,6 +2396,7 @@ void drain_all_pages(struct zone *zone) } on_each_cpu_mask(&cpus_with_pcps, (smp_call_func_t) drain_local_pages, zone, 1); + mutex_unlock(&lock); } #ifdef CONFIG_HIBERNATION -- By the way, drain_all_pages() is a sleepable context, isn't it? I don't get get soft lockup using current linux.git (8b1b41ee74f9). But I trivially get soft lockup if I try above change after reverting "mm, page_alloc: use static global work_struct for draining per-cpu pages" and "mm, page_alloc: drain per-cpu pages from workqueue context" on linux-next-20170207 . List corruption also came in with linux-next-20170207 . -- [ 32.672890] ip6_tables: (C) 2000-2006 Netfilter Core Team [ 33.109860] Ebtables v2.0 registered [ 33.410293] nf_conntrack version 0.5.0 (16384 buckets, 65536 max) [ 33.935512] IPv6: ADDRCONF(NETDEV_UP): eno1628: link is not ready [ 33.937478] e1000: eno1628 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None [ 33.939777] IPv6: ADDRCONF(NETDEV_CHANGE): eno1628: link becomes ready [ 34.194000] Netfilter messages via NETLINK v0.30. [ 34.258828] ip_set: protocol 6 [ 38.518375] nf_conntrack: default automatic helper assignment has been turned off for security reasons and CT-based firewall rule not found. Use the iptables CT target to attach helpers instead. [ 43.911609] cp (5167) used greatest stack depth: 10488 bytes left [ 48.174125] cp (5860) used greatest stack depth: 10224 bytes left [ 101.075280] [ cut here ] [ 101.076743] WARNING: CPU: 0 PID: 9766 at lib/list_debug.c:25 __list_add_valid+0x46/0xa0 [ 101.079095] list_add corruption. next->prev should be prev (ea00013dfce0), but was 88006d3e1d40. (next=88006d3e1d40). [ 101.081863] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ipt_REJECT nf_reject_ipv4 ip6t_REJECT nf_reject_ipv6 xt_conntrack ip_set nfnetlink ebtable_nat ebtable_broute bridge stp llc ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd ppdev glue_he
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Tue, Feb 07, 2017 at 10:42:49AM +, Mel Gorman wrote: > On Tue, Feb 07, 2017 at 10:23:31AM +0100, Vlastimil Babka wrote: > > > cpu offlining. I have to check the code but my impression was that WQ > > > code will ignore the cpu requested by the work item when the cpu is > > > going offline. If the offline happens while the worker function already > > > executes then it has to wait as we run with preemption disabled so we > > > should be safe here. Or am I missing something obvious? > > > > Tejun suggested an alternative solution to avoiding get_online_cpus() in > > this thread: > > https://lkml.kernel.org/r/<20170123170329.ga7...@htj.duckdns.org> > > But it would look like the following as it could be serialised against > pcpu_drain_mutex as the cpu hotplug teardown callback is allowed to sleep. > Bah, this is obviously unsafe. It's guaranteed to deadlock. -- Mel Gorman SUSE Labs
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Tue, Feb 07, 2017 at 10:23:31AM +0100, Vlastimil Babka wrote: > > cpu offlining. I have to check the code but my impression was that WQ > > code will ignore the cpu requested by the work item when the cpu is > > going offline. If the offline happens while the worker function already > > executes then it has to wait as we run with preemption disabled so we > > should be safe here. Or am I missing something obvious? > > Tejun suggested an alternative solution to avoiding get_online_cpus() in > this thread: > https://lkml.kernel.org/r/<20170123170329.ga7...@htj.duckdns.org> But it would look like the following as it could be serialised against pcpu_drain_mutex as the cpu hotplug teardown callback is allowed to sleep. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3b93879990fd..8cd8b1bbe00c 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2319,9 +2319,17 @@ static void drain_pages(unsigned int cpu) { struct zone *zone; + /* +* A per-cpu drain via a workqueue from drain_all_pages can be +* rescheduled onto an unrelated CPU. That allows the hotplug +* operation and the drain to potentially race on the same +* CPU. Serialise hotplug versus drain using pcpu_drain_mutex +*/ + mutex_lock(&pcpu_drain_mutex); for_each_populated_zone(zone) { drain_pages_zone(cpu, zone); } + mutex_unlock(&pcpu_drain_mutex); } /* @@ -2377,13 +2385,10 @@ void drain_all_pages(struct zone *zone) mutex_lock(&pcpu_drain_mutex); } - get_online_cpus(); - /* -* We don't care about racing with CPU hotplug event -* as offline notification will cause the notified -* cpu to drain that CPU pcps and on_each_cpu_mask -* disables preemption as part of its processing +* We don't care about racing with CPU hotplug event as offline +* notification will cause the notified cpu to drain that CPU pcps +* and it is serialised against here via pcpu_drain_mutex. */ for_each_online_cpu(cpu) { struct per_cpu_pageset *pcp; @@ -2418,7 +2423,6 @@ void drain_all_pages(struct zone *zone) for_each_cpu(cpu, &cpus_with_pcps) flush_work(per_cpu_ptr(&pcpu_drain, cpu)); - put_online_cpus(); mutex_unlock(&pcpu_drain_mutex); } -- Mel Gorman SUSE Labs
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Tue 07-02-17 10:28:09, Mel Gorman wrote: > On Tue, Feb 07, 2017 at 10:49:28AM +0100, Vlastimil Babka wrote: > > On 02/07/2017 10:43 AM, Mel Gorman wrote: > > > If I'm reading this right, a hot-remove will set the pool > > > POOL_DISASSOCIATED > > > and unbound. A workqueue queued for draining get migrated during > > > hot-remove > > > and a drain operation will execute twice on a CPU -- one for what was > > > queued and a second time for the CPU it was migrated from. It should still > > > work with flush_work which doesn't appear to block forever if an item > > > got migrated to another workqueue. The actual drain workqueue function is > > > using the CPU ID it's currently running on so it shouldn't get confused. > > > > Is the worker that will process this migrated workqueue also guaranteed > > to be pinned to a cpu for the whole work, though? drain_local_pages() > > needs that guarantee. > > > > It should be by running on a workqueue handler bound to that CPU (queued > on wq->cpu_pwqs in __queue_work) Are you sure? The comment in kernel/workqueue.c says * While DISASSOCIATED, the cpu may be offline and all workers have * %WORKER_UNBOUND set and concurrency management disabled, and may * be executing on any CPU. The pool behaves as an unbound one. I might be misreadig but an unbound pool can be handled by workers which are not pinned on any cpu AFAIU. -- Michal Hocko SUSE Labs
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Tue, Feb 07, 2017 at 10:49:28AM +0100, Vlastimil Babka wrote: > On 02/07/2017 10:43 AM, Mel Gorman wrote: > > If I'm reading this right, a hot-remove will set the pool POOL_DISASSOCIATED > > and unbound. A workqueue queued for draining get migrated during hot-remove > > and a drain operation will execute twice on a CPU -- one for what was > > queued and a second time for the CPU it was migrated from. It should still > > work with flush_work which doesn't appear to block forever if an item > > got migrated to another workqueue. The actual drain workqueue function is > > using the CPU ID it's currently running on so it shouldn't get confused. > > Is the worker that will process this migrated workqueue also guaranteed > to be pinned to a cpu for the whole work, though? drain_local_pages() > needs that guarantee. > It should be by running on a workqueue handler bound to that CPU (queued on wq->cpu_pwqs in __queue_work) -- Mel Gorman SUSE Labs
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Tue 07-02-17 10:49:28, Vlastimil Babka wrote: > On 02/07/2017 10:43 AM, Mel Gorman wrote: > > If I'm reading this right, a hot-remove will set the pool POOL_DISASSOCIATED > > and unbound. A workqueue queued for draining get migrated during hot-remove > > and a drain operation will execute twice on a CPU -- one for what was > > queued and a second time for the CPU it was migrated from. It should still > > work with flush_work which doesn't appear to block forever if an item > > got migrated to another workqueue. The actual drain workqueue function is > > using the CPU ID it's currently running on so it shouldn't get confused. > > Is the worker that will process this migrated workqueue also guaranteed > to be pinned to a cpu for the whole work, though? drain_local_pages() > needs that guarantee. Yeah I guess you are right. This would mean that drain_local_pages_wq should to preempt_{disable,enable} around drain_local_pages > > > Tejun, did I miss anything? Does a workqueue item queued on a CPU being > > offline get unbound and a caller can still flush it safely? In this > > specific case, it's ok that the workqueue item does not run on the CPU it > > was queued on. I guess we need to do one more step and ensure that our (rebound) worker doesn't race with the page_alloc_cpu_notify. I guess we can just cmpxchg pcp->count in drain_pages_zone to ensure the exclusivity. Not as simple as I originally thought but doable I guess and definitely better than making a subtle dependency on the hotplug locks which is just a PITA to maintain. -- Michal Hocko SUSE Labs
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Tue 07-02-17 10:23:31, Vlastimil Babka wrote: > On 02/07/2017 09:48 AM, Michal Hocko wrote: > > On Mon 06-02-17 22:05:30, Mel Gorman wrote: > >>> Unfortunately it does not seem to help. > >> > >> I'm a little stuck on how to best handle this. get_online_cpus() can > >> halt forever if the hotplug operation is holding the mutex when calling > >> pcpu_alloc. One option would be to add a try_get_online_cpus() helper which > >> trylocks the mutex. However, given that drain is so unlikely to actually > >> make that make a difference when racing against parallel allocations, > >> I think this should be acceptable. > >> > >> Any objections? > >> > >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >> index 3b93879990fd..a3192447e906 100644 > >> --- a/mm/page_alloc.c > >> +++ b/mm/page_alloc.c > >> @@ -3432,7 +3432,17 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, > >> unsigned int order, > >> */ > >>if (!page && !drained) { > >>unreserve_highatomic_pageblock(ac, false); > >> - drain_all_pages(NULL); > >> + > >> + /* > >> + * Only drain from contexts allocating for user allocations. > >> + * Kernel allocations could be holding a CPU hotplug-related > >> + * mutex, particularly hot-add allocating per-cpu structures > >> + * while hotplug-related mutex's are held which would prevent > >> + * get_online_cpus ever returning. > >> + */ > >> + if (gfp_mask & __GFP_HARDWALL) > >> + drain_all_pages(NULL); > >> + > > > > This wouldn't work AFAICS. If you look at the lockdep splat, the path > > which reverses the locking order (takes pcpu_alloc_mutex prior to > > cpu_hotplug.lock is bpf_array_alloc_percpu which is GFP_USER and thus > > __GFP_HARDWALL. > > > > I believe we shouldn't pull any dependency on the hotplug locks inside > > the allocator. This is just too fragile! Can we simply drop the > > get_online_cpus()? Why do we need it, anyway? Say we are racing with the > > It was added after I noticed in review that queue_work_on() has a > comment that caller must ensure that cpu can't go away, and wondered > about it. Ohh, I haven't noticed the comment. Thanks for pointing it out. I still do not see what would a missing get_online_cpus mean for queuing. > Also noted that a similar lru_add_drain_all() does it too. > > > cpu offlining. I have to check the code but my impression was that WQ > > code will ignore the cpu requested by the work item when the cpu is > > going offline. If the offline happens while the worker function already > > executes then it has to wait as we run with preemption disabled so we > > should be safe here. Or am I missing something obvious? > > Tejun suggested an alternative solution to avoiding get_online_cpus() in > this thread: > https://lkml.kernel.org/r/<20170123170329.ga7...@htj.duckdns.org> OK, so we have page_alloc_cpu_notify which also does drain_pages so all we have to do to make sure they do not race is to synchronize there. -- Michal Hocko SUSE Labs
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On 02/07/2017 10:43 AM, Mel Gorman wrote: > If I'm reading this right, a hot-remove will set the pool POOL_DISASSOCIATED > and unbound. A workqueue queued for draining get migrated during hot-remove > and a drain operation will execute twice on a CPU -- one for what was > queued and a second time for the CPU it was migrated from. It should still > work with flush_work which doesn't appear to block forever if an item > got migrated to another workqueue. The actual drain workqueue function is > using the CPU ID it's currently running on so it shouldn't get confused. Is the worker that will process this migrated workqueue also guaranteed to be pinned to a cpu for the whole work, though? drain_local_pages() needs that guarantee. > Tejun, did I miss anything? Does a workqueue item queued on a CPU being > offline get unbound and a caller can still flush it safely? In this > specific case, it's ok that the workqueue item does not run on the CPU it > was queued on. >
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Tue, Feb 07, 2017 at 10:23:31AM +0100, Vlastimil Babka wrote: > > > cpu offlining. I have to check the code but my impression was that WQ > > code will ignore the cpu requested by the work item when the cpu is > > going offline. If the offline happens while the worker function already > > executes then it has to wait as we run with preemption disabled so we > > should be safe here. Or am I missing something obvious? > > Tejun suggested an alternative solution to avoiding get_online_cpus() in > this thread: > https://lkml.kernel.org/r/<20170123170329.ga7...@htj.duckdns.org> I was hoping it would not really be necessary to sync against the cpu offline callback if we know we don't really have to guarantee we run on the correct CPU. -- Mel Gorman SUSE Labs
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Tue, Feb 07, 2017 at 09:48:56AM +0100, Michal Hocko wrote: > > + > > + /* > > +* Only drain from contexts allocating for user allocations. > > +* Kernel allocations could be holding a CPU hotplug-related > > +* mutex, particularly hot-add allocating per-cpu structures > > +* while hotplug-related mutex's are held which would prevent > > +* get_online_cpus ever returning. > > +*/ > > + if (gfp_mask & __GFP_HARDWALL) > > + drain_all_pages(NULL); > > + > > This wouldn't work AFAICS. If you look at the lockdep splat, the path > which reverses the locking order (takes pcpu_alloc_mutex prior to > cpu_hotplug.lock is bpf_array_alloc_percpu which is GFP_USER and thus > __GFP_HARDWALL. > You're right, I looked at the wrong caller. > I believe we shouldn't pull any dependency on the hotplug locks inside > the allocator. This is just too fragile! Can we simply drop the > get_online_cpus()? Why do we need it, anyway? To stop the CPU being queued going offline. Another alternative is bringing back try_get_offline_cpus which Peter pointed out to be offline was removed in commit 02ef3c4a2aae65a1632b27770bfea3f83ca06772 although it'd be a shame for just this case. > Say we are racing with the > cpu offlining. I have to check the code but my impression was that WQ > code will ignore the cpu requested by the work item when the cpu is > going offline. If the offline happens while the worker function already > executes then it has to wait as we run with preemption disabled so we > should be safe here. Or am I missing something obvious? It may be safe but I'd prefer to get confirmation from Tejun. If it happens that a drain on a particular CPU gets missed in this path, it's not the end of the world as the CPU offlining drains the pages in another path so nothing gets lost. It would also be acceptable if one CPU was drained twice because the workqueue was unbound from a CPU being hot-removed and moved during a hotplug operation. If I'm reading this right, a hot-remove will set the pool POOL_DISASSOCIATED and unbound. A workqueue queued for draining get migrated during hot-remove and a drain operation will execute twice on a CPU -- one for what was queued and a second time for the CPU it was migrated from. It should still work with flush_work which doesn't appear to block forever if an item got migrated to another workqueue. The actual drain workqueue function is using the CPU ID it's currently running on so it shouldn't get confused. Tejun, did I miss anything? Does a workqueue item queued on a CPU being offline get unbound and a caller can still flush it safely? In this specific case, it's ok that the workqueue item does not run on the CPU it was queued on. -- Mel Gorman SUSE Labs
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On 02/07/2017 09:48 AM, Michal Hocko wrote: > On Mon 06-02-17 22:05:30, Mel Gorman wrote: >>> Unfortunately it does not seem to help. >> >> I'm a little stuck on how to best handle this. get_online_cpus() can >> halt forever if the hotplug operation is holding the mutex when calling >> pcpu_alloc. One option would be to add a try_get_online_cpus() helper which >> trylocks the mutex. However, given that drain is so unlikely to actually >> make that make a difference when racing against parallel allocations, >> I think this should be acceptable. >> >> Any objections? >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 3b93879990fd..a3192447e906 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -3432,7 +3432,17 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned >> int order, >> */ >> if (!page && !drained) { >> unreserve_highatomic_pageblock(ac, false); >> -drain_all_pages(NULL); >> + >> +/* >> + * Only drain from contexts allocating for user allocations. >> + * Kernel allocations could be holding a CPU hotplug-related >> + * mutex, particularly hot-add allocating per-cpu structures >> + * while hotplug-related mutex's are held which would prevent >> + * get_online_cpus ever returning. >> + */ >> +if (gfp_mask & __GFP_HARDWALL) >> +drain_all_pages(NULL); >> + > > This wouldn't work AFAICS. If you look at the lockdep splat, the path > which reverses the locking order (takes pcpu_alloc_mutex prior to > cpu_hotplug.lock is bpf_array_alloc_percpu which is GFP_USER and thus > __GFP_HARDWALL. > > I believe we shouldn't pull any dependency on the hotplug locks inside > the allocator. This is just too fragile! Can we simply drop the > get_online_cpus()? Why do we need it, anyway? Say we are racing with the It was added after I noticed in review that queue_work_on() has a comment that caller must ensure that cpu can't go away, and wondered about it. Also noted that a similar lru_add_drain_all() does it too. > cpu offlining. I have to check the code but my impression was that WQ > code will ignore the cpu requested by the work item when the cpu is > going offline. If the offline happens while the worker function already > executes then it has to wait as we run with preemption disabled so we > should be safe here. Or am I missing something obvious? Tejun suggested an alternative solution to avoiding get_online_cpus() in this thread: https://lkml.kernel.org/r/<20170123170329.ga7...@htj.duckdns.org>
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Mon 06-02-17 22:05:30, Mel Gorman wrote: > On Mon, Feb 06, 2017 at 08:13:35PM +0100, Dmitry Vyukov wrote: > > On Mon, Jan 30, 2017 at 4:48 PM, Dmitry Vyukov wrote: > > > On Sun, Jan 29, 2017 at 6:22 PM, Vlastimil Babka wrote: > > >> On 29.1.2017 13:44, Dmitry Vyukov wrote: > > >>> Hello, > > >>> > > >>> I've got the following deadlock report while running syzkaller fuzzer > > >>> on f37208bc3c9c2f811460ef264909dfbc7f605a60: > > >>> > > >>> [ INFO: possible circular locking dependency detected ] > > >>> 4.10.0-rc5-next-20170125 #1 Not tainted > > >>> --- > > >>> syz-executor3/14255 is trying to acquire lock: > > >>> (cpu_hotplug.dep_map){++}, at: [] > > >>> get_online_cpus+0x37/0x90 kernel/cpu.c:239 > > >>> > > >>> but task is already holding lock: > > >>> (pcpu_alloc_mutex){+.+.+.}, at: [] > > >>> pcpu_alloc+0xbfe/0x1290 mm/percpu.c:897 > > >>> > > >>> which lock already depends on the new lock. > > >> > > >> I suspect the dependency comes from recent changes in drain_all_pages(). > > >> They > > >> were later redone (for other reasons, but nice to have another > > >> validation) in > > >> the mmots patch [1], which AFAICS is not yet in mmotm and thus > > >> linux-next. Could > > >> you try if it helps? > > > > > > It happened only once on linux-next, so I can't verify the fix. But I > > > will watch out for other occurrences. > > > > Unfortunately it does not seem to help. > > I'm a little stuck on how to best handle this. get_online_cpus() can > halt forever if the hotplug operation is holding the mutex when calling > pcpu_alloc. One option would be to add a try_get_online_cpus() helper which > trylocks the mutex. However, given that drain is so unlikely to actually > make that make a difference when racing against parallel allocations, > I think this should be acceptable. > > Any objections? > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 3b93879990fd..a3192447e906 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -3432,7 +3432,17 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned > int order, >*/ > if (!page && !drained) { > unreserve_highatomic_pageblock(ac, false); > - drain_all_pages(NULL); > + > + /* > + * Only drain from contexts allocating for user allocations. > + * Kernel allocations could be holding a CPU hotplug-related > + * mutex, particularly hot-add allocating per-cpu structures > + * while hotplug-related mutex's are held which would prevent > + * get_online_cpus ever returning. > + */ > + if (gfp_mask & __GFP_HARDWALL) > + drain_all_pages(NULL); > + This wouldn't work AFAICS. If you look at the lockdep splat, the path which reverses the locking order (takes pcpu_alloc_mutex prior to cpu_hotplug.lock is bpf_array_alloc_percpu which is GFP_USER and thus __GFP_HARDWALL. I believe we shouldn't pull any dependency on the hotplug locks inside the allocator. This is just too fragile! Can we simply drop the get_online_cpus()? Why do we need it, anyway? Say we are racing with the cpu offlining. I have to check the code but my impression was that WQ code will ignore the cpu requested by the work item when the cpu is going offline. If the offline happens while the worker function already executes then it has to wait as we run with preemption disabled so we should be safe here. Or am I missing something obvious? -- Michal Hocko SUSE Labs
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Mon 06-02-17 20:13:35, Dmitry Vyukov wrote: [...] > Fuzzer now runs on 510948533b059f4f5033464f9f4a0c32d4ab0c08 of > mmotm/auto-latest > (git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git): > > commit 510948533b059f4f5033464f9f4a0c32d4ab0c08 > Date: Thu Feb 2 10:08:47 2017 +0100 > mmotm: userfaultfd-non-cooperative-add-event-for-memory-unmaps-fix > > The commit you referenced is already there: > > commit 806b158031ca0b4714e775898396529a758ebc2c > Date: Thu Feb 2 08:53:16 2017 +0100 > mm, page_alloc: use static global work_struct for draining per-cpu pages > > But I still got: > > [ INFO: possible circular locking dependency detected ] > 4.9.0 #6 Not tainted > --- > syz-executor1/8199 is trying to acquire lock: > (cpu_hotplug.dep_map){++}, at: [] > get_online_cpus+0x37/0x90 kernel/cpu.c:246 > but task is already holding lock: > (pcpu_alloc_mutex){+.+.+.}, at: [] pcpu_alloc+0xbda/0x1280 > mm/percpu.c:896 > which lock already depends on the new lock. > the original was too hard to read so here is the reformated output. > the existing dependency chain (in reverse order) is: > > [ 403.953319] [] validate_chain > kernel/locking/lockdep.c:2265 [inline] > [ 403.953319] [] __lock_acquire+0x2149/0x3430 > kernel/locking/lockdep.c:3338 > [ 403.961232] [] lock_acquire+0x2a1/0x630 > kernel/locking/lockdep.c:3753 > [ 403.968788] [] __mutex_lock_common > kernel/locking/mutex.c:521 [inline] > [ 403.968788] [] mutex_lock_nested+0x24e/0xff0 > kernel/locking/mutex.c:621 > [ 403.976782] [] pcpu_alloc+0xbda/0x1280 mm/percpu.c:896 > [ 403.984266] [] __alloc_percpu+0x24/0x30 mm/percpu.c:1075 > [ 403.991873] [] smpcfd_prepare_cpu+0x73/0xd0 > kernel/smp.c:44 > [ 403.999799] [] cpuhp_invoke_callback+0x254/0x1480 > kernel/cpu.c:136 > [ 404.008253] [] cpuhp_up_callbacks+0x81/0x2a0 > kernel/cpu.c:493 > [ 404.016365] [] _cpu_up+0x1e3/0x2a0 kernel/cpu.c:1057 > [ 404.023507] [] do_cpu_up+0x73/0xa0 kernel/cpu.c:1087 > [ 404.030647] [] cpu_up+0x18/0x20 kernel/cpu.c:1095 > [ 404.037523] [] smp_init+0xe9/0xee kernel/smp.c:564 > [ 404.044559] [] kernel_init_freeable+0x439/0x690 > init/main.c:1010 > [ 404.052811] [] kernel_init+0x13/0x180 init/main.c:941 > [ 404.060198] [] ret_from_fork+0x2a/0x40 > arch/x86/entry/entry_64.S:433 cpu_hotplug_begin cpu_hotplug.lock pcpu_alloc pcpu_alloc_mutex > [ 404.072827] [] validate_chain > kernel/locking/lockdep.c:2265 [inline] > [ 404.072827] [] __lock_acquire+0x2149/0x3430 > kernel/locking/lockdep.c:3338 > [ 404.080733] [] lock_acquire+0x2a1/0x630 > kernel/locking/lockdep.c:3753 > [ 404.088311] [] __mutex_lock_common > kernel/locking/mutex.c:521 [inline] > [ 404.088311] [] mutex_lock_nested+0x24e/0xff0 > kernel/locking/mutex.c:621 > [ 404.096318] [] cpu_hotplug_begin+0x206/0x2e0 > kernel/cpu.c:304 > [ 404.104321] [] _cpu_up+0xca/0x2a0 kernel/cpu.c:1011 > [ 404.111357] [] do_cpu_up+0x73/0xa0 kernel/cpu.c:1087 > [ 404.118480] [] cpu_up+0x18/0x20 kernel/cpu.c:1095 > [ 404.125360] [] smp_init+0xe9/0xee kernel/smp.c:564 > [ 404.132393] [] kernel_init_freeable+0x439/0x690 > init/main.c:1010 > [ 404.140668] [] kernel_init+0x13/0x180 init/main.c:941 > [ 404.148079] [] ret_from_fork+0x2a/0x40 > arch/x86/entry/entry_64.S:433 cpu_hotplug_begin cpu_hotplug.lock > [ 404.160977] [] check_prev_add > kernel/locking/lockdep.c:1828 [inline] > [ 404.160977] [] check_prevs_add+0xa8d/0x1c00 > kernel/locking/lockdep.c:1938 > [ 404.168898] [] validate_chain > kernel/locking/lockdep.c:2265 [inline] > [ 404.168898] [] __lock_acquire+0x2149/0x3430 > kernel/locking/lockdep.c:3338 > [ 404.176844] [] lock_acquire+0x2a1/0x630 > kernel/locking/lockdep.c:3753 > [ 404.184416] [] get_online_cpus+0x62/0x90 kernel/cpu.c:248 > [ 404.192103] [] drain_all_pages+0xf8/0x710 > mm/page_alloc.c:2385 > [ 404.199880] [] __alloc_pages_direct_reclaim > mm/page_alloc.c:3440 [inline] > [ 404.199880] [] __alloc_pages_slowpath+0x8fd/0x2370 > mm/page_alloc.c:3778 > [ 404.208406] [] __alloc_pages_nodemask+0x8f5/0xc60 > mm/page_alloc.c:3980 > [ 404.216851] [] __alloc_pages include/linux/gfp.h:426 > [inline] > [ 404.216851] [] __alloc_pages_node > include/linux/gfp.h:439 [inline] > [ 404.216851] [] alloc_pages_node include/linux/gfp.h:453 > [inline] > [ 404.216851] [] pcpu_alloc_pages mm/percpu-vm.c:93 > [inline] > [ 404.216851] [] pcpu_populate_chunk+0x1e1/0x900 > mm/percpu-vm.c:282 > [ 404.225015] [] pcpu_alloc+0xe01/0x1280 mm/percpu.c:998 > [ 404.232482] [] __alloc_percpu_gfp+0x27/0x30 > mm/percpu.c:1062 > [ 404.240389] [] bpf_array_alloc_percpu > kernel/bpf/arraymap.c:34 [inline] > [ 404.240389] [] array_map_alloc+0x532/0x710 > kernel/bpf/arraymap.c:99 > [ 404.248224] [] find_and_alloc_map > kernel/bpf/syscall.c:34 [inline] > [ 404.248224] [] map_create kernel/bpf/syscall.c:188 > [inline] > [ 404.248224] [] SYSC_bpf kernel/bpf/syscall.c:870 [inline] > [ 404.248224] [
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Mon, Feb 06, 2017 at 08:13:35PM +0100, Dmitry Vyukov wrote: > On Mon, Jan 30, 2017 at 4:48 PM, Dmitry Vyukov wrote: > > On Sun, Jan 29, 2017 at 6:22 PM, Vlastimil Babka wrote: > >> On 29.1.2017 13:44, Dmitry Vyukov wrote: > >>> Hello, > >>> > >>> I've got the following deadlock report while running syzkaller fuzzer > >>> on f37208bc3c9c2f811460ef264909dfbc7f605a60: > >>> > >>> [ INFO: possible circular locking dependency detected ] > >>> 4.10.0-rc5-next-20170125 #1 Not tainted > >>> --- > >>> syz-executor3/14255 is trying to acquire lock: > >>> (cpu_hotplug.dep_map){++}, at: [] > >>> get_online_cpus+0x37/0x90 kernel/cpu.c:239 > >>> > >>> but task is already holding lock: > >>> (pcpu_alloc_mutex){+.+.+.}, at: [] > >>> pcpu_alloc+0xbfe/0x1290 mm/percpu.c:897 > >>> > >>> which lock already depends on the new lock. > >> > >> I suspect the dependency comes from recent changes in drain_all_pages(). > >> They > >> were later redone (for other reasons, but nice to have another validation) > >> in > >> the mmots patch [1], which AFAICS is not yet in mmotm and thus linux-next. > >> Could > >> you try if it helps? > > > > It happened only once on linux-next, so I can't verify the fix. But I > > will watch out for other occurrences. > > Unfortunately it does not seem to help. I'm a little stuck on how to best handle this. get_online_cpus() can halt forever if the hotplug operation is holding the mutex when calling pcpu_alloc. One option would be to add a try_get_online_cpus() helper which trylocks the mutex. However, given that drain is so unlikely to actually make that make a difference when racing against parallel allocations, I think this should be acceptable. Any objections? diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3b93879990fd..a3192447e906 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3432,7 +3432,17 @@ __alloc_pages_direct_reclaim(gfp_t gfp_mask, unsigned int order, */ if (!page && !drained) { unreserve_highatomic_pageblock(ac, false); - drain_all_pages(NULL); + + /* +* Only drain from contexts allocating for user allocations. +* Kernel allocations could be holding a CPU hotplug-related +* mutex, particularly hot-add allocating per-cpu structures +* while hotplug-related mutex's are held which would prevent +* get_online_cpus ever returning. +*/ + if (gfp_mask & __GFP_HARDWALL) + drain_all_pages(NULL); + drained = true; goto retry; } -- Mel Gorman SUSE Labs
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Mon, Jan 30, 2017 at 4:48 PM, Dmitry Vyukov wrote: > On Sun, Jan 29, 2017 at 6:22 PM, Vlastimil Babka wrote: >> On 29.1.2017 13:44, Dmitry Vyukov wrote: >>> Hello, >>> >>> I've got the following deadlock report while running syzkaller fuzzer >>> on f37208bc3c9c2f811460ef264909dfbc7f605a60: >>> >>> [ INFO: possible circular locking dependency detected ] >>> 4.10.0-rc5-next-20170125 #1 Not tainted >>> --- >>> syz-executor3/14255 is trying to acquire lock: >>> (cpu_hotplug.dep_map){++}, at: [] >>> get_online_cpus+0x37/0x90 kernel/cpu.c:239 >>> >>> but task is already holding lock: >>> (pcpu_alloc_mutex){+.+.+.}, at: [] >>> pcpu_alloc+0xbfe/0x1290 mm/percpu.c:897 >>> >>> which lock already depends on the new lock. >> >> I suspect the dependency comes from recent changes in drain_all_pages(). They >> were later redone (for other reasons, but nice to have another validation) in >> the mmots patch [1], which AFAICS is not yet in mmotm and thus linux-next. >> Could >> you try if it helps? > > It happened only once on linux-next, so I can't verify the fix. But I > will watch out for other occurrences. Unfortunately it does not seem to help. Fuzzer now runs on 510948533b059f4f5033464f9f4a0c32d4ab0c08 of mmotm/auto-latest (git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git): commit 510948533b059f4f5033464f9f4a0c32d4ab0c08 Date: Thu Feb 2 10:08:47 2017 +0100 mmotm: userfaultfd-non-cooperative-add-event-for-memory-unmaps-fix The commit you referenced is already there: commit 806b158031ca0b4714e775898396529a758ebc2c Date: Thu Feb 2 08:53:16 2017 +0100 mm, page_alloc: use static global work_struct for draining per-cpu pages But I still got: [ INFO: possible circular locking dependency detected ] 4.9.0 #6 Not tainted --- syz-executor1/8199 is trying to acquire lock: (cpu_hotplug.dep_map){++}, at: [] get_online_cpus+0x37/0x90 kernel/cpu.c:246 but task is already holding lock: (pcpu_alloc_mutex){+.+.+.}, at: [] pcpu_alloc+0xbda/0x1280 mm/percpu.c:896 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: [ 403.953319] [] validate_chain kernel/locking/lockdep.c:2265 [inline] [ 403.953319] [] __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3338 [ 403.961232] [] lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753 [ 403.968788] [] __mutex_lock_common kernel/locking/mutex.c:521 [inline] [ 403.968788] [] mutex_lock_nested+0x24e/0xff0 kernel/locking/mutex.c:621 [ 403.976782] [] pcpu_alloc+0xbda/0x1280 mm/percpu.c:896 [ 403.984266] [] __alloc_percpu+0x24/0x30 mm/percpu.c:1075 [ 403.991873] [] smpcfd_prepare_cpu+0x73/0xd0 kernel/smp.c:44 [ 403.999799] [] cpuhp_invoke_callback+0x254/0x1480 kernel/cpu.c:136 [ 404.008253] [] cpuhp_up_callbacks+0x81/0x2a0 kernel/cpu.c:493 [ 404.016365] [] _cpu_up+0x1e3/0x2a0 kernel/cpu.c:1057 [ 404.023507] [] do_cpu_up+0x73/0xa0 kernel/cpu.c:1087 [ 404.030647] [] cpu_up+0x18/0x20 kernel/cpu.c:1095 [ 404.037523] [] smp_init+0xe9/0xee kernel/smp.c:564 [ 404.044559] [] kernel_init_freeable+0x439/0x690 init/main.c:1010 [ 404.052811] [] kernel_init+0x13/0x180 init/main.c:941 [ 404.060198] [] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433 [ 404.072827] [] validate_chain kernel/locking/lockdep.c:2265 [inline] [ 404.072827] [] __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3338 [ 404.080733] [] lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753 [ 404.088311] [] __mutex_lock_common kernel/locking/mutex.c:521 [inline] [ 404.088311] [] mutex_lock_nested+0x24e/0xff0 kernel/locking/mutex.c:621 [ 404.096318] [] cpu_hotplug_begin+0x206/0x2e0 kernel/cpu.c:304 [ 404.104321] [] _cpu_up+0xca/0x2a0 kernel/cpu.c:1011 [ 404.111357] [] do_cpu_up+0x73/0xa0 kernel/cpu.c:1087 [ 404.118480] [] cpu_up+0x18/0x20 kernel/cpu.c:1095 [ 404.125360] [] smp_init+0xe9/0xee kernel/smp.c:564 [ 404.132393] [] kernel_init_freeable+0x439/0x690 init/main.c:1010 [ 404.140668] [] kernel_init+0x13/0x180 init/main.c:941 [ 404.148079] [] ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:433 [ 404.160977] [] check_prev_add kernel/locking/lockdep.c:1828 [inline] [ 404.160977] [] check_prevs_add+0xa8d/0x1c00 kernel/locking/lockdep.c:1938 [ 404.168898] [] validate_chain kernel/locking/lockdep.c:2265 [inline] [ 404.168898] [] __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3338 [ 404.176844] [] lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753 [ 404.184416] [] get_online_cpus+0x62/0x90 kernel/cpu.c:248 [ 404.192103] [] drain_all_pages+0xf8/0x710 mm/page_alloc.c:2385 [ 404.199880] [] __alloc_pages_direct_reclaim mm/page_alloc.c:
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On Sun, Jan 29, 2017 at 6:22 PM, Vlastimil Babka wrote: > On 29.1.2017 13:44, Dmitry Vyukov wrote: >> Hello, >> >> I've got the following deadlock report while running syzkaller fuzzer >> on f37208bc3c9c2f811460ef264909dfbc7f605a60: >> >> [ INFO: possible circular locking dependency detected ] >> 4.10.0-rc5-next-20170125 #1 Not tainted >> --- >> syz-executor3/14255 is trying to acquire lock: >> (cpu_hotplug.dep_map){++}, at: [] >> get_online_cpus+0x37/0x90 kernel/cpu.c:239 >> >> but task is already holding lock: >> (pcpu_alloc_mutex){+.+.+.}, at: [] >> pcpu_alloc+0xbfe/0x1290 mm/percpu.c:897 >> >> which lock already depends on the new lock. > > I suspect the dependency comes from recent changes in drain_all_pages(). They > were later redone (for other reasons, but nice to have another validation) in > the mmots patch [1], which AFAICS is not yet in mmotm and thus linux-next. > Could > you try if it helps? It happened only once on linux-next, so I can't verify the fix. But I will watch out for other occurrences.
Re: mm: deadlock between get_online_cpus/pcpu_alloc
On 29.1.2017 13:44, Dmitry Vyukov wrote: > Hello, > > I've got the following deadlock report while running syzkaller fuzzer > on f37208bc3c9c2f811460ef264909dfbc7f605a60: > > [ INFO: possible circular locking dependency detected ] > 4.10.0-rc5-next-20170125 #1 Not tainted > --- > syz-executor3/14255 is trying to acquire lock: > (cpu_hotplug.dep_map){++}, at: [] > get_online_cpus+0x37/0x90 kernel/cpu.c:239 > > but task is already holding lock: > (pcpu_alloc_mutex){+.+.+.}, at: [] > pcpu_alloc+0xbfe/0x1290 mm/percpu.c:897 > > which lock already depends on the new lock. I suspect the dependency comes from recent changes in drain_all_pages(). They were later redone (for other reasons, but nice to have another validation) in the mmots patch [1], which AFAICS is not yet in mmotm and thus linux-next. Could you try if it helps? Vlastimil [1] http://ozlabs.org/~akpm/mmots/broken-out/mm-page_alloc-use-static-global-work_struct-for-draining-per-cpu-pages.patch > > the existing dependency chain (in reverse order) is: > > -> #2 (pcpu_alloc_mutex){+.+.+.}: > > [] validate_chain kernel/locking/lockdep.c:2265 [inline] > [] __lock_acquire+0x2149/0x3430 > kernel/locking/lockdep.c:3338 > [] lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753 > [] __mutex_lock_common kernel/locking/mutex.c:757 [inline] > [] __mutex_lock+0x382/0x25c0 kernel/locking/mutex.c:894 > [] mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:909 > [] pcpu_alloc+0xbfe/0x1290 mm/percpu.c:897 > [] __alloc_percpu+0x24/0x30 mm/percpu.c:1076 > [] smpcfd_prepare_cpu+0x73/0xd0 kernel/smp.c:47 > [] cpuhp_invoke_callback+0x256/0x1480 kernel/cpu.c:136 > [] cpuhp_up_callbacks+0x81/0x2a0 kernel/cpu.c:425 > [] _cpu_up+0x1e3/0x2a0 kernel/cpu.c:940 > [] do_cpu_up+0x73/0xa0 kernel/cpu.c:970 > [] cpu_up+0x18/0x20 kernel/cpu.c:978 > [] smp_init+0x148/0x160 kernel/smp.c:565 > [] kernel_init_freeable+0x43e/0x695 init/main.c:1026 > [] kernel_init+0x13/0x180 init/main.c:955 > [] ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430 > > -> #1 (cpu_hotplug.lock){+.+.+.}: > > [] validate_chain kernel/locking/lockdep.c:2265 [inline] > [] __lock_acquire+0x2149/0x3430 > kernel/locking/lockdep.c:3338 > [] lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753 > [] __mutex_lock_common kernel/locking/mutex.c:757 [inline] > [] __mutex_lock+0x382/0x25c0 kernel/locking/mutex.c:894 > [] mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:909 > [] cpu_hotplug_begin+0x206/0x2e0 kernel/cpu.c:297 > [] _cpu_up+0xca/0x2a0 kernel/cpu.c:894 > [] do_cpu_up+0x73/0xa0 kernel/cpu.c:970 > [] cpu_up+0x18/0x20 kernel/cpu.c:978 > [] smp_init+0x148/0x160 kernel/smp.c:565 > [] kernel_init_freeable+0x43e/0x695 init/main.c:1026 > [] kernel_init+0x13/0x180 init/main.c:955 > [] ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430 > > -> #0 (cpu_hotplug.dep_map){++}: > > [] check_prev_add kernel/locking/lockdep.c:1828 [inline] > [] check_prevs_add+0xa8f/0x19f0 > kernel/locking/lockdep.c:1938 > [] validate_chain kernel/locking/lockdep.c:2265 [inline] > [] __lock_acquire+0x2149/0x3430 > kernel/locking/lockdep.c:3338 > [] lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753 > [] get_online_cpus+0x62/0x90 kernel/cpu.c:241 > [] drain_all_pages.part.98+0x8c/0x8f0 mm/page_alloc.c:2371 > [] drain_all_pages mm/page_alloc.c:2364 [inline] > [] __alloc_pages_direct_reclaim mm/page_alloc.c:3435 > [inline] > [] __alloc_pages_slowpath+0x966/0x23d0 mm/page_alloc.c:3773 > [] __alloc_pages_nodemask+0x8f5/0xc60 mm/page_alloc.c:3975 > [] __alloc_pages include/linux/gfp.h:426 [inline] > [] __alloc_pages_node include/linux/gfp.h:439 [inline] > [] alloc_pages_node include/linux/gfp.h:453 [inline] > [] pcpu_alloc_pages mm/percpu-vm.c:93 [inline] > [] pcpu_populate_chunk+0x1e1/0x900 mm/percpu-vm.c:282 > [] pcpu_alloc+0xe15/0x1290 mm/percpu.c:999 > [] __alloc_percpu_gfp+0x27/0x30 mm/percpu.c:1063 > [] bpf_array_alloc_percpu kernel/bpf/arraymap.c:33 [inline] > [] array_map_alloc+0x543/0x700 kernel/bpf/arraymap.c:94 > [] find_and_alloc_map kernel/bpf/syscall.c:37 [inline] > [] map_create kernel/bpf/syscall.c:228 [inline] > [] SYSC_bpf kernel/bpf/syscall.c:1040 [inline] > [] SyS_bpf+0x108d/0x27c0 kernel/bpf/syscall.c:997 > [] entry_SYSCALL_64_fastpath+0x1f/0xc2 > > other info that might help us debug this: > > Chain exists of: > cpu_hotplug.dep_map --> cpu_hotplug.lock --> pcpu_alloc_mutex > > Possible unsafe locking scenario: > >CPU0CPU1 > > lock(pcpu_alloc_mutex); >lock(cpu_hotplug.lock); >lock(pcpu_alloc_mutex); > lock(cpu_hotplug.dep_map); > > *** DEADLOCK *** > > 1 lock held by syz-executor3/14255: > #0: (pcpu_alloc_mutex){+.+.+.}, at: [] > pcpu_alloc+0xbfe/0x1290 mm/percpu.c:897 > > stack backtrace: > CPU: 1 PID: 14255 Comm: syz-executor3 Not tainted 4.10.0-rc5-next-20170125 #1 > Hardware name: Google Goo
mm: deadlock between get_online_cpus/pcpu_alloc
Hello, I've got the following deadlock report while running syzkaller fuzzer on f37208bc3c9c2f811460ef264909dfbc7f605a60: [ INFO: possible circular locking dependency detected ] 4.10.0-rc5-next-20170125 #1 Not tainted --- syz-executor3/14255 is trying to acquire lock: (cpu_hotplug.dep_map){++}, at: [] get_online_cpus+0x37/0x90 kernel/cpu.c:239 but task is already holding lock: (pcpu_alloc_mutex){+.+.+.}, at: [] pcpu_alloc+0xbfe/0x1290 mm/percpu.c:897 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (pcpu_alloc_mutex){+.+.+.}: [] validate_chain kernel/locking/lockdep.c:2265 [inline] [] __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3338 [] lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753 [] __mutex_lock_common kernel/locking/mutex.c:757 [inline] [] __mutex_lock+0x382/0x25c0 kernel/locking/mutex.c:894 [] mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:909 [] pcpu_alloc+0xbfe/0x1290 mm/percpu.c:897 [] __alloc_percpu+0x24/0x30 mm/percpu.c:1076 [] smpcfd_prepare_cpu+0x73/0xd0 kernel/smp.c:47 [] cpuhp_invoke_callback+0x256/0x1480 kernel/cpu.c:136 [] cpuhp_up_callbacks+0x81/0x2a0 kernel/cpu.c:425 [] _cpu_up+0x1e3/0x2a0 kernel/cpu.c:940 [] do_cpu_up+0x73/0xa0 kernel/cpu.c:970 [] cpu_up+0x18/0x20 kernel/cpu.c:978 [] smp_init+0x148/0x160 kernel/smp.c:565 [] kernel_init_freeable+0x43e/0x695 init/main.c:1026 [] kernel_init+0x13/0x180 init/main.c:955 [] ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430 -> #1 (cpu_hotplug.lock){+.+.+.}: [] validate_chain kernel/locking/lockdep.c:2265 [inline] [] __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3338 [] lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753 [] __mutex_lock_common kernel/locking/mutex.c:757 [inline] [] __mutex_lock+0x382/0x25c0 kernel/locking/mutex.c:894 [] mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:909 [] cpu_hotplug_begin+0x206/0x2e0 kernel/cpu.c:297 [] _cpu_up+0xca/0x2a0 kernel/cpu.c:894 [] do_cpu_up+0x73/0xa0 kernel/cpu.c:970 [] cpu_up+0x18/0x20 kernel/cpu.c:978 [] smp_init+0x148/0x160 kernel/smp.c:565 [] kernel_init_freeable+0x43e/0x695 init/main.c:1026 [] kernel_init+0x13/0x180 init/main.c:955 [] ret_from_fork+0x31/0x40 arch/x86/entry/entry_64.S:430 -> #0 (cpu_hotplug.dep_map){++}: [] check_prev_add kernel/locking/lockdep.c:1828 [inline] [] check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1938 [] validate_chain kernel/locking/lockdep.c:2265 [inline] [] __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3338 [] lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753 [] get_online_cpus+0x62/0x90 kernel/cpu.c:241 [] drain_all_pages.part.98+0x8c/0x8f0 mm/page_alloc.c:2371 [] drain_all_pages mm/page_alloc.c:2364 [inline] [] __alloc_pages_direct_reclaim mm/page_alloc.c:3435 [inline] [] __alloc_pages_slowpath+0x966/0x23d0 mm/page_alloc.c:3773 [] __alloc_pages_nodemask+0x8f5/0xc60 mm/page_alloc.c:3975 [] __alloc_pages include/linux/gfp.h:426 [inline] [] __alloc_pages_node include/linux/gfp.h:439 [inline] [] alloc_pages_node include/linux/gfp.h:453 [inline] [] pcpu_alloc_pages mm/percpu-vm.c:93 [inline] [] pcpu_populate_chunk+0x1e1/0x900 mm/percpu-vm.c:282 [] pcpu_alloc+0xe15/0x1290 mm/percpu.c:999 [] __alloc_percpu_gfp+0x27/0x30 mm/percpu.c:1063 [] bpf_array_alloc_percpu kernel/bpf/arraymap.c:33 [inline] [] array_map_alloc+0x543/0x700 kernel/bpf/arraymap.c:94 [] find_and_alloc_map kernel/bpf/syscall.c:37 [inline] [] map_create kernel/bpf/syscall.c:228 [inline] [] SYSC_bpf kernel/bpf/syscall.c:1040 [inline] [] SyS_bpf+0x108d/0x27c0 kernel/bpf/syscall.c:997 [] entry_SYSCALL_64_fastpath+0x1f/0xc2 other info that might help us debug this: Chain exists of: cpu_hotplug.dep_map --> cpu_hotplug.lock --> pcpu_alloc_mutex Possible unsafe locking scenario: CPU0CPU1 lock(pcpu_alloc_mutex); lock(cpu_hotplug.lock); lock(pcpu_alloc_mutex); lock(cpu_hotplug.dep_map); *** DEADLOCK *** 1 lock held by syz-executor3/14255: #0: (pcpu_alloc_mutex){+.+.+.}, at: [] pcpu_alloc+0xbfe/0x1290 mm/percpu.c:897 stack backtrace: CPU: 1 PID: 14255 Comm: syz-executor3 Not tainted 4.10.0-rc5-next-20170125 #1 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:15 [inline] dump_stack+0x2ee/0x3ef lib/dump_stack.c:51 print_circular_bug+0x307/0x3b0 kernel/locking/lockdep.c:1202 check_prev_add kernel/locking/lockdep.c:1828 [inline] check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1938 validate_chain kernel/locking/lockdep.c:2265 [inline] __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3338 lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3753 get_online_cpus+0x62/0x90 kernel/cpu.c:241 drain_all_pages.part.98+0x8c/0x8f0 mm/page_alloc.c:2371 drain_all_pages mm/page_alloc.c:2364 [inline] __alloc