Re: 5.3-rc3: Frozen graphics with kcompactd migrating i915 pages

2019-09-12 Thread Chris Wilson
Quoting Linus Torvalds (2019-09-12 12:59:25)
> On Thu, Sep 12, 2019 at 12:51 PM Martin Wilck  wrote:
> >
> > Is there an alternative to reverting aa56a292ce62 ("drm/i915/userptr:
> > Acquire the page lock around set_page_dirty()")? And if we do, what
> > would be the consequences? Would other patches need to be reverted,
> > too?
> 
> Looking at that commit, and the backtrace of the lockup, I think that
> reverting it is the correct thing to do.
> 
> You can't take the page lock in invalidate_range(), since it's called
> from try_to_unmap(), which is called with the page lock already held.
> 
> So commit aa56a292ce62 is just fundamentally completely wrong and
> should be reverted.

There's still the dilemma that we get called without the page lock, but
at this moment in time in order to hit 5.3, it needs a revert sent
directly to Linus.
-Chris


Re: 5.3-rc3: Frozen graphics with kcompactd migrating i915 pages

2019-09-12 Thread Martin Wilck
Hi Chris,

On Tue, 2019-09-10 at 17:20 +0300, Leho Kraav wrote:
> On Fri, Aug 09, 2019 at 01:53:43PM +0100, Chris Wilson wrote:
> > Quoting Martin Wilck (2019-08-09 13:41:42)
> > > This happened to me today, running kernel 5.3.0-rc3-1.g571863b-
> > > default
> > > (5.3-rc3 with just a few patches on top), after starting a KVM
> > > virtual
> > > machine. The X screen was frozen. Remote login via ssh was still
> > > possible, thus I was able to retrieve basic logs.
> > > 
> > > sysrq-w showed two blocked processes (kcompactd0 and KVM). After
> > > a
> > > minute, the same two processes were still blocked. KVM seems to
> > > try to
> > > acquire a lock that kcompactd is holding. kcompactd is waiting
> > > for IO
> > > to complete on pages owned by the i915 driver.
> > 
> > My bad, it's known. We haven't decided on whether to revert the
> > unfortunate recursive locking (and so hit another warn elsewhere)
> > or to
> > ignore the dirty pages (and so risk losing data across swap).
> > 
> > cb6d7c7dc7ff ("drm/i915/userptr: Acquire the page lock around
> > set_page_dirty()")
> > -Chris
> 
> Hi Chris. Is this exactly what I'm hitting at
> https://bugs.freedesktop.org/show_bug.cgi?id=111500 perhaps?
> 
> It reliably breaks the graphics userland, as the machine consistently
> freezes at any random moment.
> 
> Any workaround options, even if with a performance penalty? Revert
> cb6d7c7dc7ff but side effects?
> 
> 5.3 has useful NVMe power mgmt updates for laptops, I'd like to stick
> with the newest if possible.

There's a considerable risk that many users will start seeing this
regression when 5.3 is released. I am not aware of a workaround.

Is there an alternative to reverting aa56a292ce62 ("drm/i915/userptr:
Acquire the page lock around set_page_dirty()")? And if we do, what
would be the consequences? Would other patches need to be reverted,
too?

Thanks,
Martin



Re: 5.3-rc3: Frozen graphics with kcompactd migrating i915 pages

2019-09-12 Thread l...@kraav.com
On Thu, Sep 12, 2019 at 11:23:09AM +, Martin Wilck wrote:
> 
> There's a considerable risk that many users will start seeing this
> regression when 5.3 is released. I am not aware of a workaround.
> 
> Is there an alternative to reverting aa56a292ce62 ("drm/i915/userptr:
> Acquire the page lock around set_page_dirty()")? And if we do, what
> would be the consequences? Would other patches need to be reverted,
> too?

I've been running with revert patch for a couple of days and have not
encountered any kernel warnings thus far, nor any other ill effects that
could be attributed to this locking mechanism.

But I'm far from familiar with these subsystems.

Graphics does not hang anymore.

I've also received developer feedback in private that this really should
be fixed before 5.3 release.

-- 
Leho Kraav, senior technology & digital marketing architect


Re: 5.3-rc3: Frozen graphics with kcompactd migrating i915 pages

2019-09-10 Thread Leho Kraav
On Fri, Aug 09, 2019 at 01:53:43PM +0100, Chris Wilson wrote:
> Quoting Martin Wilck (2019-08-09 13:41:42)
> > This happened to me today, running kernel 5.3.0-rc3-1.g571863b-default
> > (5.3-rc3 with just a few patches on top), after starting a KVM virtual
> > machine. The X screen was frozen. Remote login via ssh was still
> > possible, thus I was able to retrieve basic logs.
> > 
> > sysrq-w showed two blocked processes (kcompactd0 and KVM). After a
> > minute, the same two processes were still blocked. KVM seems to try to
> > acquire a lock that kcompactd is holding. kcompactd is waiting for IO
> > to complete on pages owned by the i915 driver.
> 
> My bad, it's known. We haven't decided on whether to revert the
> unfortunate recursive locking (and so hit another warn elsewhere) or to
> ignore the dirty pages (and so risk losing data across swap).
> 
> cb6d7c7dc7ff ("drm/i915/userptr: Acquire the page lock around 
> set_page_dirty()")
> -Chris

Hi Chris. Is this exactly what I'm hitting at
https://bugs.freedesktop.org/show_bug.cgi?id=111500 perhaps?

It reliably breaks the graphics userland, as the machine consistently
freezes at any random moment.

Any workaround options, even if with a performance penalty? Revert
cb6d7c7dc7ff but side effects?

5.3 has useful NVMe power mgmt updates for laptops, I'd like to stick
with the newest if possible.

-- 
Leho Kraav, senior technology & digital marketing architect