Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-08-18 Thread Greg KH
On Wed, Aug 05, 2020 at 10:46:12PM -0700, Hugh Dickins wrote: > On Mon, 27 Jul 2020, Greg KH wrote: > > > > Linus just pointed me at this thread. > > > > If you could run: > > echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control > > and run the same workload to see if

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-08-07 Thread Matthew Wilcox
On Fri, Aug 07, 2020 at 11:41:09AM -0700, Hugh Dickins wrote: > My home testing was, effectively, on top of c6fe44d96fc1 (v5.8 plus > your two patches): I did not have in the io_uring changes from the > current tree. In glancing there, I noticed one new and one previous > instance of

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-08-07 Thread Linus Torvalds
On Fri, Aug 7, 2020 at 11:41 AM Hugh Dickins wrote: > > + > + /* > +* If we hoped to pass PG_locked on to the next locker, but found > +* no exclusive taker, then we must clear it before dropping q->lock. > +* Otherwise unlock_page() can race

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-08-07 Thread Hugh Dickins
On Thu, 6 Aug 2020, Linus Torvalds wrote: > On Thu, Aug 6, 2020 at 11:00 AM Matthew Wilcox wrote: > > > > It wasn't clear to me whether Hugh thought it was an improvement or > > not that trylock was now less likely to jump the queue. There're > > the usual "fair is the opposite of throughput"

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-08-06 Thread Linus Torvalds
On Thu, Aug 6, 2020 at 11:00 AM Matthew Wilcox wrote: > > It wasn't clear to me whether Hugh thought it was an improvement or > not that trylock was now less likely to jump the queue. There're > the usual "fair is the opposite of throughput" kind of arguments. Yeah, it could go either way. But

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-08-06 Thread Matthew Wilcox
On Thu, Aug 06, 2020 at 10:07:07AM -0700, Linus Torvalds wrote: > On Wed, Aug 5, 2020 at 10:21 PM Hugh Dickins wrote: > > Something I was interested to realize in looking at this: trylock_page() > > on a contended lock is now much less likely to jump the queue and > > succeed than before, since

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-08-06 Thread Linus Torvalds
On Wed, Aug 5, 2020 at 10:21 PM Hugh Dickins wrote: > > Something I was interested to realize in looking at this: trylock_page() > on a contended lock is now much less likely to jump the queue and > succeed than before, since your lock holder hands off the page lock to > the next holder: much

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-08-05 Thread Hugh Dickins
On Mon, 27 Jul 2020, Greg KH wrote: > > Linus just pointed me at this thread. > > If you could run: > echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control > and run the same workload to see if anything shows up in the log when > xhci crashes, that would be great. Thanks,

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-08-05 Thread Hugh Dickins
Nice to see the +130.0% this morning. I got back on to this on Monday, here's some follow-up. On Sun, 26 Jul 2020, Hugh Dickins wrote: > > The comparison runs have not yet completed (except for the one started > early), but they have all got past the most interesting tests, and it's > clear

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-08-03 Thread Linus Torvalds
On Mon, Aug 3, 2020 at 6:14 AM Michal Hocko wrote: > > I hope I got it right and this is the latest version of your patches. Btw. > do you still think that increasing PAGE_WAIT_TABLE_BITS is reasonable. I suspect it's still very reasonable, but I'd love to have numbers for it. > In the meantime

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-08-03 Thread Michal Hocko
Hi, sorry for being late in discussion (was offline or very busy with other stuff). I hope I got it right and this is the latest version of your patches. Btw. do you still think that increasing PAGE_WAIT_TABLE_BITS is reasonable. In the meantime I have learned that the customer suffering from the

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-27 Thread Greg KH
On Sun, Jul 26, 2020 at 01:30:32PM -0700, Hugh Dickins wrote: > On Sat, 25 Jul 2020, Hugh Dickins wrote: > > On Sat, 25 Jul 2020, Hugh Dickins wrote: > > > On Sat, 25 Jul 2020, Linus Torvalds wrote: > > > > On Sat, Jul 25, 2020 at 3:14 AM Oleg Nesterov wrote: > > > > > > > > > > Heh. I too

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-26 Thread Hugh Dickins
On Sun, 26 Jul 2020, Linus Torvalds wrote: > On Sun, Jul 26, 2020 at 1:30 PM Hugh Dickins wrote: > > > > I've deduced nothing useful from the logs, will have to leave that > > to others here with more experience of them. But my assumption now > > is that you have successfully removed one

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-26 Thread Linus Torvalds
On Sun, Jul 26, 2020 at 1:30 PM Hugh Dickins wrote: > > I've deduced nothing useful from the logs, will have to leave that > to others here with more experience of them. But my assumption now > is that you have successfully removed one bottleneck, so the tests > get somewhat further and now

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-26 Thread Hugh Dickins
On Sat, 25 Jul 2020, Hugh Dickins wrote: > On Sat, 25 Jul 2020, Hugh Dickins wrote: > > On Sat, 25 Jul 2020, Linus Torvalds wrote: > > > On Sat, Jul 25, 2020 at 3:14 AM Oleg Nesterov wrote: > > > > > > > > Heh. I too thought about this. And just in case, your patch looks > > > > correct > > > >

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-26 Thread Oleg Nesterov
Linus, I was greatly confused and tried to confuse you. Somehow I misunderstood your last version and didn't bother to read it again until now. Sorry for noise and thanks for your explanations. Oleg. On 07/25, Linus Torvalds wrote: > > On Sat, Jul 25, 2020 at 12:28 PM Oleg Nesterov wrote: >

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-25 Thread Hugh Dickins
On Sat, 25 Jul 2020, Hugh Dickins wrote: > On Sat, 25 Jul 2020, Linus Torvalds wrote: > > On Sat, Jul 25, 2020 at 3:14 AM Oleg Nesterov wrote: > > > > > > Heh. I too thought about this. And just in case, your patch looks correct > > > to me. But I can't really comment this behavioural change.

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-25 Thread Hugh Dickins
On Sat, 25 Jul 2020, Linus Torvalds wrote: > On Sat, Jul 25, 2020 at 3:14 AM Oleg Nesterov wrote: > > > > Heh. I too thought about this. And just in case, your patch looks correct > > to me. But I can't really comment this behavioural change. Perhaps it > > should come in a separate patch? > >

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-25 Thread Linus Torvalds
On Sat, Jul 25, 2020 at 12:28 PM Oleg Nesterov wrote: > > What I tried to say. AFAICS before that commit we had (almost) the same > behaviour you propose now: unlock_page/etc wakes all the non-exclusive > waiters up. > > No? Yes, but no. We'd wake them _up_ fairly aggressively, but then they'd

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-25 Thread Oleg Nesterov
Firstly, to avoid the confusion, let me repeat I think your patch is fine. I too thought that non-exclusive waiters do not care about the bit state and thus wake_page_function() can simply wake them all up. But then I did "git blame", found your commit 3510ca20ece0150 and came to conclusion

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-25 Thread Linus Torvalds
On Sat, Jul 25, 2020 at 3:14 AM Oleg Nesterov wrote: > > Heh. I too thought about this. And just in case, your patch looks correct > to me. But I can't really comment this behavioural change. Perhaps it > should come in a separate patch? We could do that. At the same time, I think both parts

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-25 Thread Oleg Nesterov
On 07/24, Linus Torvalds wrote: > > I just realized that one thing we could do is to not even test the > page bit for the shared case in the wakeup path. > > Because anybody who uses the non-exclusive "wait_on_page_locked()" or > "wait_on_page_writeback()" isn't actually interested in the bit

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-25 Thread Oleg Nesterov
On 07/24, Linus Torvalds wrote: > > And in fact, once you do it on top of that, it becomes obvious that we > can share even more code: move the WQ_FLAG_WOKEN logic _into_ the > trylock_page_bit_common() function. Ah, indeed, somehow I didn't realize this, > I added your reviewed-by, but maybe

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-24 Thread Linus Torvalds
On Fri, Jul 24, 2020 at 7:08 PM Hugh Dickins wrote: > > But whatever, what happens on the next run, with these latest patches, > will be more important; and I'll follow this next run with a run on > the baseline without them, to compare results. So the loads you are running are known to have

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-24 Thread Hugh Dickins
On Fri, 24 Jul 2020, Linus Torvalds wrote: > On Fri, Jul 24, 2020 at 10:32 AM Linus Torvalds > wrote: > > Ok, that makes sense. Except you did it on top of the original patch > > without the fix to set WQ_FLAG_WOKEN for the non-wakeup case. > > Hmm. > > I just realized that one thing we could

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-24 Thread Linus Torvalds
On Fri, Jul 24, 2020 at 10:32 AM Linus Torvalds wrote: > Ok, that makes sense. Except you did it on top of the original patch > without the fix to set WQ_FLAG_WOKEN for the non-wakeup case. Hmm. I just realized that one thing we could do is to not even test the page bit for the shared case in

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-24 Thread Linus Torvalds
On Fri, Jul 24, 2020 at 8:24 AM Oleg Nesterov wrote: > > not sure this makes any sense, but this looks like another user of > trylock_page_bit_common(), see the patch below on top of 1/2. Ok, that makes sense. Except you did it on top of the original patch without the fix to set WQ_FLAG_WOKEN

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-24 Thread Oleg Nesterov
On 07/23, Linus Torvalds wrote: > > But I'll walk over my patch mentally one more time. Here's the current > version, anyway. Both patches look correct to me, feel free to add Reviewed-by: Oleg Nesterov > @@ -1013,18 +1014,40 @@ static int wake_page_function(wait_queue_entry_t > *wait,

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-24 Thread Oleg Nesterov
On 07/23, Linus Torvalds wrote: > > IOW, I think we should do something like this (this is on top of my > patch, since it has that wake_page_function() change in it, but notice > how we have the exact same issue in our traditional > autoremove_wake_function() usage). ... > +static inline void

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-23 Thread Hugh Dickins
On Thu, Jul 23, 2020 at 5:47 PM Linus Torvalds wrote: > > On Thu, Jul 23, 2020 at 5:07 PM Hugh Dickins wrote: > > > > I say that for full disclosure, so you don't wrack your brains > > too much, when it may still turn out to be a screwup on my part. > > Sounds unlikely. > > If that patch applied

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-23 Thread Linus Torvalds
On Thu, Jul 23, 2020 at 5:07 PM Hugh Dickins wrote: > > I say that for full disclosure, so you don't wrack your brains > too much, when it may still turn out to be a screwup on my part. Sounds unlikely. If that patch applied even reasonably closely, I don't see how you'd see a list corruption

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-23 Thread Hugh Dickins
On Thu, 23 Jul 2020, Linus Torvalds wrote: > On Thu, Jul 23, 2020 at 4:11 PM Hugh Dickins wrote: > > On Thu, 23 Jul 2020, Linus Torvalds wrote: > > > > > > I'll send a new version after I actually test it. > > > > I'll give it a try when you're happy with it. > > Ok, what I described is what

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-23 Thread Linus Torvalds
On Thu, Jul 23, 2020 at 4:11 PM Hugh Dickins wrote: > > On Thu, 23 Jul 2020, Linus Torvalds wrote: > > > > I'll send a new version after I actually test it. > > I'll give it a try when you're happy with it. Ok, what I described is what I've been running for a while now. But I don't put much

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-23 Thread Hugh Dickins
On Thu, 23 Jul 2020, Linus Torvalds wrote: > > I'll send a new version after I actually test it. I'll give it a try when you're happy with it. I did try yesterday's with my swapping loads on home machines (3 of 4 survived 16 hours), and with some google stresstests on work machines (0 of 10

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-23 Thread Linus Torvalds
On Thu, Jul 23, 2020 at 10:32 AM Linus Torvalds wrote: > > So here's a v2, now as a "real" commit with a commit message and everything. Oh, except it's broken. Switching from the "am I still on the list" logic to the "WQ_FLAG_WOKEN is set if woken" logic was all well and good, but I missed the

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-23 Thread Linus Torvalds
On Thu, Jul 23, 2020 at 11:22 AM Linus Torvalds wrote: > > So I think that is a separate issue, generic to our finish_wait() uses. IOW, I think we should do something like this (this is on top of my patch, since it has that wake_page_function() change in it, but notice how we have the exact same

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-23 Thread Linus Torvalds
On Thu, Jul 23, 2020 at 11:01 AM Oleg Nesterov wrote: > > > + * > > + * We _really_ should have a "list_del_init_careful()" to > > + * properly pair with the unlocked "list_empty_careful()" > > + * in finish_wait(). > > + */ > > + smp_mb(); > > +

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-23 Thread Oleg Nesterov
On 07/23, Linus Torvalds wrote: > > So here's a v2, now as a "real" commit with a commit message and everything. I am already sleeping, will read it tomorrow, but at first glance... > @@ -1013,18 +1014,40 @@ static int wake_page_function(wait_queue_entry_t > *wait, unsigned mode, int sync, >

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-23 Thread Linus Torvalds
On Thu, Jul 23, 2020 at 5:47 AM Oleg Nesterov wrote: > > I still can't convince myself thatI fully understand this patch but I see > nothing really wrong after a quick glance... I guess my comments should be extended further then. Is there anything in particular you think is unclear? > > +

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-23 Thread Oleg Nesterov
On 07/22, Linus Torvalds wrote: > > Comments? Oleg, this should fix the race you talked about too. Yes. I still can't convince myself thatI fully understand this patch but I see nothing really wrong after a quick glance... > + * We can no longer use 'wait' after we've done the > + *

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-23 Thread Michal Hocko
On Wed 22-07-20 11:29:20, Linus Torvalds wrote: > On Tue, Jul 21, 2020 at 8:33 AM Linus Torvalds > wrote: > > > > More likely, it's actually *caused* by that commit 11a19c7b099f, and > > what might be happening is that other CPU's are just adding new > > waiters to the list *while* we're waking

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-22 Thread Linus Torvalds
On Wed, Jul 22, 2020 at 4:42 PM Linus Torvalds wrote: > > NOTE NOTE NOTE! This is both somewhat subtle code, and ENTIRELY > UNTESTED. It seems to boot. It adds more lines than it removes, but a lot of it is comments, and while it's somewhat subtle, I think it's actually conceptually simpler

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-22 Thread Linus Torvalds
On Wed, Jul 22, 2020 at 3:10 PM Linus Torvalds wrote: > > > + bool first_time = true; > > bool thrashing = false; > > bool delayacct = false; > > unsigned long pflags; > > @@ -1134,7 +1135,12 @@ static inline int wait_on_page_bit_commo > >

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-22 Thread Linus Torvalds
On Wed, Jul 22, 2020 at 2:29 PM Hugh Dickins wrote: > > -#define PAGE_WAIT_TABLE_BITS 8 > +#define PAGE_WAIT_TABLE_BITS 10 Well, that seems harmless even on small machines. > + bool first_time = true; > bool thrashing = false; > bool delayacct = false; > unsigned

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-22 Thread Hugh Dickins
On Wed, 22 Jul 2020, Linus Torvalds wrote: > > I do wonder if we should make that PAGE_WAIT_TABLE_SIZE be larger. 256 > entries seems potentially ridiculously small, and aliasing not only > increases the waitqueue length, it also potentially causes more > contention on the waitqueue spinlock

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-22 Thread Linus Torvalds
On Tue, Jul 21, 2020 at 8:33 AM Linus Torvalds wrote: > > More likely, it's actually *caused* by that commit 11a19c7b099f, and > what might be happening is that other CPU's are just adding new > waiters to the list *while* we're waking things up, because somebody > else already got the page lock

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-21 Thread Michal Hocko
On Tue 21-07-20 08:33:33, Linus Torvalds wrote: > On Mon, Jul 20, 2020 at 11:33 PM Michal Hocko wrote: > > > > The lockup is in page_unlock in do_read_fault and I suspect that this is > > yet another effect of a very long waitqueue chain which has been > > addresses by 11a19c7b099f ("sched/wait:

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-21 Thread Linus Torvalds
On Mon, Jul 20, 2020 at 11:33 PM Michal Hocko wrote: > > The lockup is in page_unlock in do_read_fault and I suspect that this is > yet another effect of a very long waitqueue chain which has been > addresses by 11a19c7b099f ("sched/wait: Introduce wakeup boomark in > wake_up_page_bit")

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-21 Thread Michal Hocko
On Tue 21-07-20 15:17:49, Chris Down wrote: > I understand the pragmatic considerations here, but I'm quite concerned > about the maintainability and long-term ability to reason about a patch like > this. For example, how do we know when this patch is safe to remove? Also, > what other precedent

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-21 Thread Chris Down
I understand the pragmatic considerations here, but I'm quite concerned about the maintainability and long-term ability to reason about a patch like this. For example, how do we know when this patch is safe to remove? Also, what other precedent does this set for us covering for poor userspace

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-21 Thread Qian Cai
On Tue, Jul 21, 2020 at 03:38:35PM +0200, Michal Hocko wrote: > On Tue 21-07-20 09:23:44, Qian Cai wrote: > > On Tue, Jul 21, 2020 at 02:17:52PM +0200, Michal Hocko wrote: > > > On Tue 21-07-20 07:44:07, Qian Cai wrote: > > > > > > > > > > > > > On Jul 21, 2020, at 7:25 AM, Michal Hocko wrote:

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-21 Thread Michal Hocko
On Tue 21-07-20 09:23:44, Qian Cai wrote: > On Tue, Jul 21, 2020 at 02:17:52PM +0200, Michal Hocko wrote: > > On Tue 21-07-20 07:44:07, Qian Cai wrote: > > > > > > > > > > On Jul 21, 2020, at 7:25 AM, Michal Hocko wrote: > > > > > > > > Are these really important? I believe I can dig that out

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-21 Thread Qian Cai
On Tue, Jul 21, 2020 at 02:17:52PM +0200, Michal Hocko wrote: > On Tue 21-07-20 07:44:07, Qian Cai wrote: > > > > > > > On Jul 21, 2020, at 7:25 AM, Michal Hocko wrote: > > > > > > Are these really important? I believe I can dig that out from the bug > > > report but I didn't really consider

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-21 Thread Michal Hocko
On Tue 21-07-20 07:44:07, Qian Cai wrote: > > > > On Jul 21, 2020, at 7:25 AM, Michal Hocko wrote: > > > > Are these really important? I believe I can dig that out from the bug > > report but I didn't really consider that important enough. > > Please dig them out. We have also been running

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-21 Thread Qian Cai
> On Jul 21, 2020, at 7:25 AM, Michal Hocko wrote: > > Are these really important? I believe I can dig that out from the bug > report but I didn't really consider that important enough. Please dig them out. We have also been running those things on “large” powerpc as well and never saw such

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-21 Thread Michal Hocko
On Tue 21-07-20 07:10:14, Qian Cai wrote: > > > > On Jul 21, 2020, at 2:33 AM, Michal Hocko wrote: > > > > on a large ppc machine. The very likely cause is a suboptimal > > configuration when systed-udev spawns way too many workders to bring the > > system up. > > This is strange. The problem

Re: [RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-21 Thread Qian Cai
> On Jul 21, 2020, at 2:33 AM, Michal Hocko wrote: > > on a large ppc machine. The very likely cause is a suboptimal > configuration when systed-udev spawns way too many workders to bring the > system up. This is strange. The problem description is missing quite a few important details. For

[RFC PATCH] mm: silence soft lockups from unlock_page

2020-07-21 Thread Michal Hocko
From: Michal Hocko We have seen a bug report with huge number of soft lockups during the system boot on !PREEMPT kernel NMI watchdog: BUG: soft lockup - CPU#1291 stuck for 22s! [systemd-udevd:43283] [...] NIP [c094e66c] _raw_spin_lock_irqsave+0xac/0x100 LR [c094e654]