Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-06-26 Thread J. R. Okajima
Jani Nikula:
> On Thu, 15 Jun 2017, "J. R. Okajima"  wrote:
> > How about v4.11.x series?
> > I got v4.11.5, but it doesn't contain the fix.
> > Do you have a plan?
>
> The upstream commit has the proper Cc: stable and Fixes: tags in place,
> it just takes a while for the patches to trickle to stable kernels.

Now I can see the fix exists in v4.11.7.
Thank you.


J. R. Okajima


Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-06-26 Thread J. R. Okajima
Jani Nikula:
> On Thu, 15 Jun 2017, "J. R. Okajima"  wrote:
> > How about v4.11.x series?
> > I got v4.11.5, but it doesn't contain the fix.
> > Do you have a plan?
>
> The upstream commit has the proper Cc: stable and Fixes: tags in place,
> it just takes a while for the patches to trickle to stable kernels.

Now I can see the fix exists in v4.11.7.
Thank you.


J. R. Okajima


Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-06-15 Thread Jani Nikula
On Thu, 15 Jun 2017, "J. R. Okajima"  wrote:
> Thanx, I got linux-v4.12-rc4 and it contains
>   4681ee2 2017-05-18 drm/i915: Do not sync RCU during shrinking
>
> How about v4.11.x series?
> I got v4.11.5, but it doesn't contain the fix.
> Do you have a plan?

The upstream commit has the proper Cc: stable and Fixes: tags in place,
it just takes a while for the patches to trickle to stable kernels.

BR,
Jani.

-- 
Jani Nikula, Intel Open Source Technology Center


Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-06-15 Thread Jani Nikula
On Thu, 15 Jun 2017, "J. R. Okajima"  wrote:
> Thanx, I got linux-v4.12-rc4 and it contains
>   4681ee2 2017-05-18 drm/i915: Do not sync RCU during shrinking
>
> How about v4.11.x series?
> I got v4.11.5, but it doesn't contain the fix.
> Do you have a plan?

The upstream commit has the proper Cc: stable and Fixes: tags in place,
it just takes a while for the patches to trickle to stable kernels.

BR,
Jani.

-- 
Jani Nikula, Intel Open Source Technology Center


Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-06-14 Thread J. R. Okajima
Joonas Lahtinen:
> Don't worry, it's not lost. It was merged to drm-intel-fixes and thus is in 
> the pipeline.
>
> There were some unexpected delays getting fixes in, sorry for that.

Thanx, I got linux-v4.12-rc4 and it contains
4681ee2 2017-05-18 drm/i915: Do not sync RCU during shrinking

How about v4.11.x series?
I got v4.11.5, but it doesn't contain the fix.
Do you have a plan?


J. R. Okajima


Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-06-14 Thread J. R. Okajima
Joonas Lahtinen:
> Don't worry, it's not lost. It was merged to drm-intel-fixes and thus is in 
> the pipeline.
>
> There were some unexpected delays getting fixes in, sorry for that.

Thanx, I got linux-v4.12-rc4 and it contains
4681ee2 2017-05-18 drm/i915: Do not sync RCU during shrinking

How about v4.11.x series?
I got v4.11.5, but it doesn't contain the fix.
Do you have a plan?


J. R. Okajima


Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-05-31 Thread Joonas Lahtinen
On ti, 2017-05-30 at 13:00 -0700, Hugh Dickins wrote:
> On Mon, 22 May 2017, Joonas Lahtinen wrote:
> > On la, 2017-05-20 at 10:56 +0900, J. R. Okajima wrote:
> > > "J. R. Okajima":
> > > > 
> > > > I don't know whether the fix is good to me or not yet. I will test your
> > > > fix, but I am busy now and my test will be a few weeks later. Other
> > > > people may want the fix soon. So I'd suggest you to reproduce the
> > > > problem on your side. I guess "mem=1G" or "mem=512M" will make it easier
> > > > to reproduce the problem.
> > > > Of course, if you are sure the fix is correct, then you don't have to
> > > > wait for my test. Release it soon for other people.
> > > 
> > > Now I am going to able to run my local test.
> > > But V3 patch failed to apply onto v4.11.0.
> > > 
> > > Would you provide us two versions of the patch, one for v4.11 series and
> > > the other of v4.12-rcN?
> > 
> > For v4.12-rc2 the backport is here:
> > 
> > https://patchwork.freedesktop.org/patch/156990/
> 
> This fix seems to have got lost: we've been waiting a month,
> and 4.12 is now at rc3: please expedite the unexpedited :)

Don't worry, it's not lost. It was merged to drm-intel-fixes and thus is in the 
pipeline.

There were some unexpected delays getting fixes in, sorry for that.

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation


Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-05-31 Thread Joonas Lahtinen
On ti, 2017-05-30 at 13:00 -0700, Hugh Dickins wrote:
> On Mon, 22 May 2017, Joonas Lahtinen wrote:
> > On la, 2017-05-20 at 10:56 +0900, J. R. Okajima wrote:
> > > "J. R. Okajima":
> > > > 
> > > > I don't know whether the fix is good to me or not yet. I will test your
> > > > fix, but I am busy now and my test will be a few weeks later. Other
> > > > people may want the fix soon. So I'd suggest you to reproduce the
> > > > problem on your side. I guess "mem=1G" or "mem=512M" will make it easier
> > > > to reproduce the problem.
> > > > Of course, if you are sure the fix is correct, then you don't have to
> > > > wait for my test. Release it soon for other people.
> > > 
> > > Now I am going to able to run my local test.
> > > But V3 patch failed to apply onto v4.11.0.
> > > 
> > > Would you provide us two versions of the patch, one for v4.11 series and
> > > the other of v4.12-rcN?
> > 
> > For v4.12-rc2 the backport is here:
> > 
> > https://patchwork.freedesktop.org/patch/156990/
> 
> This fix seems to have got lost: we've been waiting a month,
> and 4.12 is now at rc3: please expedite the unexpedited :)

Don't worry, it's not lost. It was merged to drm-intel-fixes and thus is in the 
pipeline.

There were some unexpected delays getting fixes in, sorry for that.

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation


Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-05-31 Thread Jani Nikula
On Tue, 30 May 2017, Hugh Dickins  wrote:
> On Mon, 22 May 2017, Joonas Lahtinen wrote:
>> On la, 2017-05-20 at 10:56 +0900, J. R. Okajima wrote:
>> > "J. R. Okajima":
>> > > 
>> > > I don't know whether the fix is good to me or not yet. I will test your
>> > > fix, but I am busy now and my test will be a few weeks later. Other
>> > > people may want the fix soon. So I'd suggest you to reproduce the
>> > > problem on your side. I guess "mem=1G" or "mem=512M" will make it easier
>> > > to reproduce the problem.
>> > > Of course, if you are sure the fix is correct, then you don't have to
>> > > wait for my test. Release it soon for other people.
>> > 
>> > Now I am going to able to run my local test.
>> > But V3 patch failed to apply onto v4.11.0.
>> > 
>> > Would you provide us two versions of the patch, one for v4.11 series and
>> > the other of v4.12-rcN?
>> 
>> For v4.12-rc2 the backport is here:
>> 
>> https://patchwork.freedesktop.org/patch/156990/
>
> This fix seems to have got lost: we've been waiting a month,
> and 4.12 is now at rc3: please expedite the unexpedited :)

It's included in my latest pull request to Dave [1], should make it to
v4.12-rc4.

BR,
Jani.


[1] http://mid.mail-archive.com/877f0zlr5d.fsf@intel.com


-- 
Jani Nikula, Intel Open Source Technology Center


Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-05-31 Thread Jani Nikula
On Tue, 30 May 2017, Hugh Dickins  wrote:
> On Mon, 22 May 2017, Joonas Lahtinen wrote:
>> On la, 2017-05-20 at 10:56 +0900, J. R. Okajima wrote:
>> > "J. R. Okajima":
>> > > 
>> > > I don't know whether the fix is good to me or not yet. I will test your
>> > > fix, but I am busy now and my test will be a few weeks later. Other
>> > > people may want the fix soon. So I'd suggest you to reproduce the
>> > > problem on your side. I guess "mem=1G" or "mem=512M" will make it easier
>> > > to reproduce the problem.
>> > > Of course, if you are sure the fix is correct, then you don't have to
>> > > wait for my test. Release it soon for other people.
>> > 
>> > Now I am going to able to run my local test.
>> > But V3 patch failed to apply onto v4.11.0.
>> > 
>> > Would you provide us two versions of the patch, one for v4.11 series and
>> > the other of v4.12-rcN?
>> 
>> For v4.12-rc2 the backport is here:
>> 
>> https://patchwork.freedesktop.org/patch/156990/
>
> This fix seems to have got lost: we've been waiting a month,
> and 4.12 is now at rc3: please expedite the unexpedited :)

It's included in my latest pull request to Dave [1], should make it to
v4.12-rc4.

BR,
Jani.


[1] http://mid.mail-archive.com/877f0zlr5d.fsf@intel.com


-- 
Jani Nikula, Intel Open Source Technology Center


Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-05-30 Thread Hugh Dickins
On Mon, 22 May 2017, Joonas Lahtinen wrote:
> On la, 2017-05-20 at 10:56 +0900, J. R. Okajima wrote:
> > "J. R. Okajima":
> > > 
> > > I don't know whether the fix is good to me or not yet. I will test your
> > > fix, but I am busy now and my test will be a few weeks later. Other
> > > people may want the fix soon. So I'd suggest you to reproduce the
> > > problem on your side. I guess "mem=1G" or "mem=512M" will make it easier
> > > to reproduce the problem.
> > > Of course, if you are sure the fix is correct, then you don't have to
> > > wait for my test. Release it soon for other people.
> > 
> > Now I am going to able to run my local test.
> > But V3 patch failed to apply onto v4.11.0.
> > 
> > Would you provide us two versions of the patch, one for v4.11 series and
> > the other of v4.12-rcN?
> 
> For v4.12-rc2 the backport is here:
> 
> https://patchwork.freedesktop.org/patch/156990/

This fix seems to have got lost: we've been waiting a month,
and 4.12 is now at rc3: please expedite the unexpedited :)

Thanks,
Hugh


Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-05-30 Thread Hugh Dickins
On Mon, 22 May 2017, Joonas Lahtinen wrote:
> On la, 2017-05-20 at 10:56 +0900, J. R. Okajima wrote:
> > "J. R. Okajima":
> > > 
> > > I don't know whether the fix is good to me or not yet. I will test your
> > > fix, but I am busy now and my test will be a few weeks later. Other
> > > people may want the fix soon. So I'd suggest you to reproduce the
> > > problem on your side. I guess "mem=1G" or "mem=512M" will make it easier
> > > to reproduce the problem.
> > > Of course, if you are sure the fix is correct, then you don't have to
> > > wait for my test. Release it soon for other people.
> > 
> > Now I am going to able to run my local test.
> > But V3 patch failed to apply onto v4.11.0.
> > 
> > Would you provide us two versions of the patch, one for v4.11 series and
> > the other of v4.12-rcN?
> 
> For v4.12-rc2 the backport is here:
> 
> https://patchwork.freedesktop.org/patch/156990/

This fix seems to have got lost: we've been waiting a month,
and 4.12 is now at rc3: please expedite the unexpedited :)

Thanks,
Hugh


Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-05-22 Thread Joonas Lahtinen
On la, 2017-05-20 at 10:56 +0900, J. R. Okajima wrote:
> "J. R. Okajima":
> > 
> > I don't know whether the fix is good to me or not yet. I will test your
> > fix, but I am busy now and my test will be a few weeks later. Other
> > people may want the fix soon. So I'd suggest you to reproduce the
> > problem on your side. I guess "mem=1G" or "mem=512M" will make it easier
> > to reproduce the problem.
> > Of course, if you are sure the fix is correct, then you don't have to
> > wait for my test. Release it soon for other people.
> 
> Now I am going to able to run my local test.
> But V3 patch failed to apply onto v4.11.0.
> 
> Would you provide us two versions of the patch, one for v4.11 series and
> the other of v4.12-rcN?

For v4.12-rc2 the backport is here:

https://patchwork.freedesktop.org/patch/156990/

For quick testing on older kernels, just remove all lines with "_rcu"
in drivers/gpu/drm/i915/i915_gem_shrinker.c .

Regards, Joonas

PS: It'll help to subscribe to and track our mailing list at
intel-...@lists.freedesktop.org for future convenience.
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation


Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-05-22 Thread Joonas Lahtinen
On la, 2017-05-20 at 10:56 +0900, J. R. Okajima wrote:
> "J. R. Okajima":
> > 
> > I don't know whether the fix is good to me or not yet. I will test your
> > fix, but I am busy now and my test will be a few weeks later. Other
> > people may want the fix soon. So I'd suggest you to reproduce the
> > problem on your side. I guess "mem=1G" or "mem=512M" will make it easier
> > to reproduce the problem.
> > Of course, if you are sure the fix is correct, then you don't have to
> > wait for my test. Release it soon for other people.
> 
> Now I am going to able to run my local test.
> But V3 patch failed to apply onto v4.11.0.
> 
> Would you provide us two versions of the patch, one for v4.11 series and
> the other of v4.12-rcN?

For v4.12-rc2 the backport is here:

https://patchwork.freedesktop.org/patch/156990/

For quick testing on older kernels, just remove all lines with "_rcu"
in drivers/gpu/drm/i915/i915_gem_shrinker.c .

Regards, Joonas

PS: It'll help to subscribe to and track our mailing list at
intel-...@lists.freedesktop.org for future convenience.
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation


Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-05-19 Thread J. R. Okajima
"J. R. Okajima":
> I don't know whether the fix is good to me or not yet. I will test your
> fix, but I am busy now and my test will be a few weeks later. Other
> people may want the fix soon. So I'd suggest you to reproduce the
> problem on your side. I guess "mem=1G" or "mem=512M" will make it easier
> to reproduce the problem.
> Of course, if you are sure the fix is correct, then you don't have to
> wait for my test. Release it soon for other people.

Now I am going to able to run my local test.
But V3 patch failed to apply onto v4.11.0.

Would you provide us two versions of the patch, one for v4.11 series and
the other of v4.12-rcN?


J. R. Okajima


Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-05-19 Thread J. R. Okajima
"J. R. Okajima":
> I don't know whether the fix is good to me or not yet. I will test your
> fix, but I am busy now and my test will be a few weeks later. Other
> people may want the fix soon. So I'd suggest you to reproduce the
> problem on your side. I guess "mem=1G" or "mem=512M" will make it easier
> to reproduce the problem.
> Of course, if you are sure the fix is correct, then you don't have to
> wait for my test. Release it soon for other people.

Now I am going to able to run my local test.
But V3 patch failed to apply onto v4.11.0.

Would you provide us two versions of the patch, one for v4.11 series and
the other of v4.12-rcN?


J. R. Okajima


Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-05-10 Thread Joonas Lahtinen
On ke, 2017-05-10 at 12:43 +0200, Andrea Arcangeli wrote:
> It works for me too. I'm running my workstation also with
> synchronize_rcu removed from i915_gem_shrink_all in addition to the
> above. Isn't the oom method invoked from reclaim context too? As far
> as I can tell synchronize_rcu can end up throttling on a background
> synchronize_rcu_expedited(), so it might end up in the same issue
> unless removed too.

Thanks for testing and spotting my bad grepping, I'll add your T-b and
s
end v3.

Regards, Joonas

> Tested-by: Andrea Arcangeli 
> 
> (I can't reproduce the lockups 100% of the time, but they never
> happened again with this patch and I happened to run the load that
> reproduces them a couple of times already with v4.11 and this patch
> applied)
> 
> It's also certainly improving performance by removing the
> synchronize_rcu_expedited from the _count methods where it was useless
> (in addition to unsafe).
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation


Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-05-10 Thread Joonas Lahtinen
On ke, 2017-05-10 at 12:43 +0200, Andrea Arcangeli wrote:
> It works for me too. I'm running my workstation also with
> synchronize_rcu removed from i915_gem_shrink_all in addition to the
> above. Isn't the oom method invoked from reclaim context too? As far
> as I can tell synchronize_rcu can end up throttling on a background
> synchronize_rcu_expedited(), so it might end up in the same issue
> unless removed too.

Thanks for testing and spotting my bad grepping, I'll add your T-b and
s
end v3.

Regards, Joonas

> Tested-by: Andrea Arcangeli 
> 
> (I can't reproduce the lockups 100% of the time, but they never
> happened again with this patch and I happened to run the load that
> reproduces them a couple of times already with v4.11 and this patch
> applied)
> 
> It's also certainly improving performance by removing the
> synchronize_rcu_expedited from the _count methods where it was useless
> (in addition to unsafe).
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation


Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-05-10 Thread Andrea Arcangeli
Hello,

On Tue, May 09, 2017 at 08:04:24PM -0700, Hugh Dickins wrote:
> On Mon, 8 May 2017, Joonas Lahtinen wrote:
> > On pe, 2017-05-05 at 14:57 -0700, Hugh Dickins wrote:
> > > On Fri, 5 May 2017, Joonas Lahtinen wrote:
> > > > On ma, 2017-05-01 at 11:05 +0900, J. R. Okajima wrote:
> > > > > Thanx for the reply.
> > > > > 
> > > > > Andrea Arcangeli:
> > > > > > 
> > > > > > Yes I already reported this, my original fix was way more efficient
> > > > > > (and also safer considering the above) than what landed upstream. My
> > > > > > feedback was ignored though.
> > > > > > 
> > > > > > https://lists.freedesktop.org/archives/intel-gfx/2017-April/125414.html
> > > > > 
> > > > > I see.
> > > > > Actually on my test system for v4.11-rc8, kthreadd, kworker, kswapd 
> > > > > and
> > > > > others all stopped working due to the synchronize_rcu_expedited call
> > > > > from i915_gem_shrinker_count. It is definitly a show stopper for me as
> > > > > an i915 user.
> > > > 
> > > > Filing a bug in freedesktop.org with all the details is the fastest way
> > > > of getting help. Without the bug (and with such little information as
> > > > the previous e-mail) it's hard to estimate the extent and nature of the
> > > > bug.
> > > > 
> > > > I've anyway gone and prepared a patch to drop the RCU sync completely
> > > > from shrinker phase, as discussed originally with Chris.
> > > 
> > > Is that a patch that will be suitable for 4.11-stable?  Please do post
> > > it here.  I had not experienced this i915-induced hang at all when
> > > Andrea first mentioned it, nor even on 4.11-rc8; but now with 4.11
> > > final I can get it fairly easily (I haven't tried Andrea's fix yet).
> > 
> > Please try:
> > 
> > https://patchwork.freedesktop.org/patch/154713/
> > 
> > If it works, a Tested-by: would be appreciated.
> 
> Yes, that works for me, thank you.
> 
> Tested-by: Hugh Dickins 
> 
> But the linked patch seems to be lacking a Reported-by (not me) tag,
> a Fixes tag, a Cc stable tag, and any indication in the Subject or
> commit message that this patch is something needed to fix hangs
> observed by several people - it just sounds like a minor cleanup.

It works for me too. I'm running my workstation also with
synchronize_rcu removed from i915_gem_shrink_all in addition to the
above. Isn't the oom method invoked from reclaim context too? As far
as I can tell synchronize_rcu can end up throttling on a background
synchronize_rcu_expedited(), so it might end up in the same issue
unless removed too.

Tested-by: Andrea Arcangeli 

(I can't reproduce the lockups 100% of the time, but they never
happened again with this patch and I happened to run the load that
reproduces them a couple of times already with v4.11 and this patch
applied)

It's also certainly improving performance by removing the
synchronize_rcu_expedited from the _count methods where it was useless
(in addition to unsafe).

Thanks,
Andrea


Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-05-10 Thread Andrea Arcangeli
Hello,

On Tue, May 09, 2017 at 08:04:24PM -0700, Hugh Dickins wrote:
> On Mon, 8 May 2017, Joonas Lahtinen wrote:
> > On pe, 2017-05-05 at 14:57 -0700, Hugh Dickins wrote:
> > > On Fri, 5 May 2017, Joonas Lahtinen wrote:
> > > > On ma, 2017-05-01 at 11:05 +0900, J. R. Okajima wrote:
> > > > > Thanx for the reply.
> > > > > 
> > > > > Andrea Arcangeli:
> > > > > > 
> > > > > > Yes I already reported this, my original fix was way more efficient
> > > > > > (and also safer considering the above) than what landed upstream. My
> > > > > > feedback was ignored though.
> > > > > > 
> > > > > > https://lists.freedesktop.org/archives/intel-gfx/2017-April/125414.html
> > > > > 
> > > > > I see.
> > > > > Actually on my test system for v4.11-rc8, kthreadd, kworker, kswapd 
> > > > > and
> > > > > others all stopped working due to the synchronize_rcu_expedited call
> > > > > from i915_gem_shrinker_count. It is definitly a show stopper for me as
> > > > > an i915 user.
> > > > 
> > > > Filing a bug in freedesktop.org with all the details is the fastest way
> > > > of getting help. Without the bug (and with such little information as
> > > > the previous e-mail) it's hard to estimate the extent and nature of the
> > > > bug.
> > > > 
> > > > I've anyway gone and prepared a patch to drop the RCU sync completely
> > > > from shrinker phase, as discussed originally with Chris.
> > > 
> > > Is that a patch that will be suitable for 4.11-stable?  Please do post
> > > it here.  I had not experienced this i915-induced hang at all when
> > > Andrea first mentioned it, nor even on 4.11-rc8; but now with 4.11
> > > final I can get it fairly easily (I haven't tried Andrea's fix yet).
> > 
> > Please try:
> > 
> > https://patchwork.freedesktop.org/patch/154713/
> > 
> > If it works, a Tested-by: would be appreciated.
> 
> Yes, that works for me, thank you.
> 
> Tested-by: Hugh Dickins 
> 
> But the linked patch seems to be lacking a Reported-by (not me) tag,
> a Fixes tag, a Cc stable tag, and any indication in the Subject or
> commit message that this patch is something needed to fix hangs
> observed by several people - it just sounds like a minor cleanup.

It works for me too. I'm running my workstation also with
synchronize_rcu removed from i915_gem_shrink_all in addition to the
above. Isn't the oom method invoked from reclaim context too? As far
as I can tell synchronize_rcu can end up throttling on a background
synchronize_rcu_expedited(), so it might end up in the same issue
unless removed too.

Tested-by: Andrea Arcangeli 

(I can't reproduce the lockups 100% of the time, but they never
happened again with this patch and I happened to run the load that
reproduces them a couple of times already with v4.11 and this patch
applied)

It's also certainly improving performance by removing the
synchronize_rcu_expedited from the _count methods where it was useless
(in addition to unsafe).

Thanks,
Andrea


Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-05-10 Thread Joonas Lahtinen
On ti, 2017-05-09 at 20:04 -0700, Hugh Dickins wrote:
> On Mon, 8 May 2017, Joonas Lahtinen wrote:
> > On pe, 2017-05-05 at 14:57 -0700, Hugh Dickins wrote:
> > > On Fri, 5 May 2017, Joonas Lahtinen wrote:
> > > > On ma, 2017-05-01 at 11:05 +0900, J. R. Okajima wrote:
> > > > > Thanx for the reply.
> > > > > 
> > > > > Andrea Arcangeli:
> > > > > > 
> > > > > > Yes I already reported this, my original fix was way more efficient
> > > > > > (and also safer considering the above) than what landed upstream. My
> > > > > > feedback was ignored though.
> > > > > > 
> > > > > > https://lists.freedesktop.org/archives/intel-gfx/2017-April/125414.html
> > > > > 
> > > > > I see.
> > > > > Actually on my test system for v4.11-rc8, kthreadd, kworker, kswapd 
> > > > > and
> > > > > others all stopped working due to the synchronize_rcu_expedited call
> > > > > from i915_gem_shrinker_count. It is definitly a show stopper for me as
> > > > > an i915 user.
> > > > 
> > > > Filing a bug in freedesktop.org with all the details is the fastest way
> > > > of getting help. Without the bug (and with such little information as
> > > > the previous e-mail) it's hard to estimate the extent and nature of the
> > > > bug.
> > > > 
> > > > I've anyway gone and prepared a patch to drop the RCU sync completely
> > > > from shrinker phase, as discussed originally with Chris.
> > > 
> > > Is that a patch that will be suitable for 4.11-stable?  Please do post
> > > it here.  I had not experienced this i915-induced hang at all when
> > > Andrea first mentioned it, nor even on 4.11-rc8; but now with 4.11
> > > final I can get it fairly easily (I haven't tried Andrea's fix yet).
> > 
> > Please try:
> > 
> > https://patchwork.freedesktop.org/patch/154713/
> > 
> > If it works, a Tested-by: would be appreciated.
> 
> Yes, that works for me, thank you.
> 
> Tested-by: Hugh Dickins 
> 
> But the linked patch seems to be lacking a Reported-by (not me) tag,
> a Fixes tag, a Cc stable tag, and any indication in the Subject or
> commit message that this patch is something needed to fix hangs
> observed by several people - it just sounds like a minor cleanup.

It is a patch that was agreed to be pushed anyway, so if it wouldn't
have resolved the problem, I'd have pushed it as is.

I'll add J. R. Okajima as Reported-by and refer to the bisected commit,
even though so far Freedesktop Bugzilla or intel-gfx mailing list has
no other reports.

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation


Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-05-10 Thread Joonas Lahtinen
On ti, 2017-05-09 at 20:04 -0700, Hugh Dickins wrote:
> On Mon, 8 May 2017, Joonas Lahtinen wrote:
> > On pe, 2017-05-05 at 14:57 -0700, Hugh Dickins wrote:
> > > On Fri, 5 May 2017, Joonas Lahtinen wrote:
> > > > On ma, 2017-05-01 at 11:05 +0900, J. R. Okajima wrote:
> > > > > Thanx for the reply.
> > > > > 
> > > > > Andrea Arcangeli:
> > > > > > 
> > > > > > Yes I already reported this, my original fix was way more efficient
> > > > > > (and also safer considering the above) than what landed upstream. My
> > > > > > feedback was ignored though.
> > > > > > 
> > > > > > https://lists.freedesktop.org/archives/intel-gfx/2017-April/125414.html
> > > > > 
> > > > > I see.
> > > > > Actually on my test system for v4.11-rc8, kthreadd, kworker, kswapd 
> > > > > and
> > > > > others all stopped working due to the synchronize_rcu_expedited call
> > > > > from i915_gem_shrinker_count. It is definitly a show stopper for me as
> > > > > an i915 user.
> > > > 
> > > > Filing a bug in freedesktop.org with all the details is the fastest way
> > > > of getting help. Without the bug (and with such little information as
> > > > the previous e-mail) it's hard to estimate the extent and nature of the
> > > > bug.
> > > > 
> > > > I've anyway gone and prepared a patch to drop the RCU sync completely
> > > > from shrinker phase, as discussed originally with Chris.
> > > 
> > > Is that a patch that will be suitable for 4.11-stable?  Please do post
> > > it here.  I had not experienced this i915-induced hang at all when
> > > Andrea first mentioned it, nor even on 4.11-rc8; but now with 4.11
> > > final I can get it fairly easily (I haven't tried Andrea's fix yet).
> > 
> > Please try:
> > 
> > https://patchwork.freedesktop.org/patch/154713/
> > 
> > If it works, a Tested-by: would be appreciated.
> 
> Yes, that works for me, thank you.
> 
> Tested-by: Hugh Dickins 
> 
> But the linked patch seems to be lacking a Reported-by (not me) tag,
> a Fixes tag, a Cc stable tag, and any indication in the Subject or
> commit message that this patch is something needed to fix hangs
> observed by several people - it just sounds like a minor cleanup.

It is a patch that was agreed to be pushed anyway, so if it wouldn't
have resolved the problem, I'd have pushed it as is.

I'll add J. R. Okajima as Reported-by and refer to the bisected commit,
even though so far Freedesktop Bugzilla or intel-gfx mailing list has
no other reports.

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation


Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-05-09 Thread Hugh Dickins
On Mon, 8 May 2017, Joonas Lahtinen wrote:
> On pe, 2017-05-05 at 14:57 -0700, Hugh Dickins wrote:
> > On Fri, 5 May 2017, Joonas Lahtinen wrote:
> > > On ma, 2017-05-01 at 11:05 +0900, J. R. Okajima wrote:
> > > > Thanx for the reply.
> > > > 
> > > > Andrea Arcangeli:
> > > > > 
> > > > > Yes I already reported this, my original fix was way more efficient
> > > > > (and also safer considering the above) than what landed upstream. My
> > > > > feedback was ignored though.
> > > > > 
> > > > > https://lists.freedesktop.org/archives/intel-gfx/2017-April/125414.html
> > > > 
> > > > I see.
> > > > Actually on my test system for v4.11-rc8, kthreadd, kworker, kswapd and
> > > > others all stopped working due to the synchronize_rcu_expedited call
> > > > from i915_gem_shrinker_count. It is definitly a show stopper for me as
> > > > an i915 user.
> > > 
> > > Filing a bug in freedesktop.org with all the details is the fastest way
> > > of getting help. Without the bug (and with such little information as
> > > the previous e-mail) it's hard to estimate the extent and nature of the
> > > bug.
> > > 
> > > I've anyway gone and prepared a patch to drop the RCU sync completely
> > > from shrinker phase, as discussed originally with Chris.
> > 
> > Is that a patch that will be suitable for 4.11-stable?  Please do post
> > it here.  I had not experienced this i915-induced hang at all when
> > Andrea first mentioned it, nor even on 4.11-rc8; but now with 4.11
> > final I can get it fairly easily (I haven't tried Andrea's fix yet).
> 
> Please try:
> 
> https://patchwork.freedesktop.org/patch/154713/
> 
> If it works, a Tested-by: would be appreciated.

Yes, that works for me, thank you.

Tested-by: Hugh Dickins 

But the linked patch seems to be lacking a Reported-by (not me) tag,
a Fixes tag, a Cc stable tag, and any indication in the Subject or
commit message that this patch is something needed to fix hangs
observed by several people - it just sounds like a minor cleanup.

Hugh

Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-05-09 Thread Hugh Dickins
On Mon, 8 May 2017, Joonas Lahtinen wrote:
> On pe, 2017-05-05 at 14:57 -0700, Hugh Dickins wrote:
> > On Fri, 5 May 2017, Joonas Lahtinen wrote:
> > > On ma, 2017-05-01 at 11:05 +0900, J. R. Okajima wrote:
> > > > Thanx for the reply.
> > > > 
> > > > Andrea Arcangeli:
> > > > > 
> > > > > Yes I already reported this, my original fix was way more efficient
> > > > > (and also safer considering the above) than what landed upstream. My
> > > > > feedback was ignored though.
> > > > > 
> > > > > https://lists.freedesktop.org/archives/intel-gfx/2017-April/125414.html
> > > > 
> > > > I see.
> > > > Actually on my test system for v4.11-rc8, kthreadd, kworker, kswapd and
> > > > others all stopped working due to the synchronize_rcu_expedited call
> > > > from i915_gem_shrinker_count. It is definitly a show stopper for me as
> > > > an i915 user.
> > > 
> > > Filing a bug in freedesktop.org with all the details is the fastest way
> > > of getting help. Without the bug (and with such little information as
> > > the previous e-mail) it's hard to estimate the extent and nature of the
> > > bug.
> > > 
> > > I've anyway gone and prepared a patch to drop the RCU sync completely
> > > from shrinker phase, as discussed originally with Chris.
> > 
> > Is that a patch that will be suitable for 4.11-stable?  Please do post
> > it here.  I had not experienced this i915-induced hang at all when
> > Andrea first mentioned it, nor even on 4.11-rc8; but now with 4.11
> > final I can get it fairly easily (I haven't tried Andrea's fix yet).
> 
> Please try:
> 
> https://patchwork.freedesktop.org/patch/154713/
> 
> If it works, a Tested-by: would be appreciated.

Yes, that works for me, thank you.

Tested-by: Hugh Dickins 

But the linked patch seems to be lacking a Reported-by (not me) tag,
a Fixes tag, a Cc stable tag, and any indication in the Subject or
commit message that this patch is something needed to fix hangs
observed by several people - it just sounds like a minor cleanup.

Hugh

Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-05-08 Thread Joonas Lahtinen
On pe, 2017-05-05 at 14:57 -0700, Hugh Dickins wrote:
> On Fri, 5 May 2017, Joonas Lahtinen wrote:
> > On ma, 2017-05-01 at 11:05 +0900, J. R. Okajima wrote:
> > > Thanx for the reply.
> > > 
> > > Andrea Arcangeli:
> > > > 
> > > > Yes I already reported this, my original fix was way more efficient
> > > > (and also safer considering the above) than what landed upstream. My
> > > > feedback was ignored though.
> > > > 
> > > > https://lists.freedesktop.org/archives/intel-gfx/2017-April/125414.html
> > > 
> > > I see.
> > > Actually on my test system for v4.11-rc8, kthreadd, kworker, kswapd and
> > > others all stopped working due to the synchronize_rcu_expedited call
> > > from i915_gem_shrinker_count. It is definitly a show stopper for me as
> > > an i915 user.
> > 
> > Filing a bug in freedesktop.org with all the details is the fastest way
> > of getting help. Without the bug (and with such little information as
> > the previous e-mail) it's hard to estimate the extent and nature of the
> > bug.
> > 
> > I've anyway gone and prepared a patch to drop the RCU sync completely
> > from shrinker phase, as discussed originally with Chris.
> 
> Is that a patch that will be suitable for 4.11-stable?  Please do post
> it here.  I had not experienced this i915-induced hang at all when
> Andrea first mentioned it, nor even on 4.11-rc8; but now with 4.11
> final I can get it fairly easily (I haven't tried Andrea's fix yet).

Please try:

https://patchwork.freedesktop.org/patch/154713/

If it works, a Tested-by: would be appreciated.

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation


Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-05-08 Thread Joonas Lahtinen
On pe, 2017-05-05 at 14:57 -0700, Hugh Dickins wrote:
> On Fri, 5 May 2017, Joonas Lahtinen wrote:
> > On ma, 2017-05-01 at 11:05 +0900, J. R. Okajima wrote:
> > > Thanx for the reply.
> > > 
> > > Andrea Arcangeli:
> > > > 
> > > > Yes I already reported this, my original fix was way more efficient
> > > > (and also safer considering the above) than what landed upstream. My
> > > > feedback was ignored though.
> > > > 
> > > > https://lists.freedesktop.org/archives/intel-gfx/2017-April/125414.html
> > > 
> > > I see.
> > > Actually on my test system for v4.11-rc8, kthreadd, kworker, kswapd and
> > > others all stopped working due to the synchronize_rcu_expedited call
> > > from i915_gem_shrinker_count. It is definitly a show stopper for me as
> > > an i915 user.
> > 
> > Filing a bug in freedesktop.org with all the details is the fastest way
> > of getting help. Without the bug (and with such little information as
> > the previous e-mail) it's hard to estimate the extent and nature of the
> > bug.
> > 
> > I've anyway gone and prepared a patch to drop the RCU sync completely
> > from shrinker phase, as discussed originally with Chris.
> 
> Is that a patch that will be suitable for 4.11-stable?  Please do post
> it here.  I had not experienced this i915-induced hang at all when
> Andrea first mentioned it, nor even on 4.11-rc8; but now with 4.11
> final I can get it fairly easily (I haven't tried Andrea's fix yet).

Please try:

https://patchwork.freedesktop.org/patch/154713/

If it works, a Tested-by: would be appreciated.

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation


Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-05-05 Thread J. R. Okajima
Joonas Lahtinen:
> Filing a bug in freedesktop.org with all the details is the fastest way
> of getting help. Without the bug (and with such little information as
> the previous e-mail) it's hard to estimate the extent and nature of the
> bug.

My original report was
http://marc.info/?l=linux-kernel=149313183203325=2

The report contained
- kernel command line options
- lockdep msg
- call trace
but it didn't look drm/i915 shrinker is related. It was git-bisect which
lead me to drm/i915 shrinker.


> I've anyway gone and prepared a patch to drop the RCU sync completely
> from shrinker phase, as discussed originally with Chris.

Thank you.
I don't know whether the fix is good to me or not yet. I will test your
fix, but I am busy now and my test will be a few weeks later. Other
people may want the fix soon. So I'd suggest you to reproduce the
problem on your side. I guess "mem=1G" or "mem=512M" will make it easier
to reproduce the problem.
Of course, if you are sure the fix is correct, then you don't have to
wait for my test. Release it soon for other people.


J. R. Okajima


Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-05-05 Thread J. R. Okajima
Joonas Lahtinen:
> Filing a bug in freedesktop.org with all the details is the fastest way
> of getting help. Without the bug (and with such little information as
> the previous e-mail) it's hard to estimate the extent and nature of the
> bug.

My original report was
http://marc.info/?l=linux-kernel=149313183203325=2

The report contained
- kernel command line options
- lockdep msg
- call trace
but it didn't look drm/i915 shrinker is related. It was git-bisect which
lead me to drm/i915 shrinker.


> I've anyway gone and prepared a patch to drop the RCU sync completely
> from shrinker phase, as discussed originally with Chris.

Thank you.
I don't know whether the fix is good to me or not yet. I will test your
fix, but I am busy now and my test will be a few weeks later. Other
people may want the fix soon. So I'd suggest you to reproduce the
problem on your side. I guess "mem=1G" or "mem=512M" will make it easier
to reproduce the problem.
Of course, if you are sure the fix is correct, then you don't have to
wait for my test. Release it soon for other people.


J. R. Okajima


Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-05-05 Thread Hugh Dickins
On Fri, 5 May 2017, Joonas Lahtinen wrote:
> On ma, 2017-05-01 at 11:05 +0900, J. R. Okajima wrote:
> > Thanx for the reply.
> > 
> > Andrea Arcangeli:
> > > 
> > > Yes I already reported this, my original fix was way more efficient
> > > (and also safer considering the above) than what landed upstream. My
> > > feedback was ignored though.
> > > 
> > > https://lists.freedesktop.org/archives/intel-gfx/2017-April/125414.html
> > 
> > I see.
> > Actually on my test system for v4.11-rc8, kthreadd, kworker, kswapd and
> > others all stopped working due to the synchronize_rcu_expedited call
> > from i915_gem_shrinker_count. It is definitly a show stopper for me as
> > an i915 user.
> 
> Filing a bug in freedesktop.org with all the details is the fastest way
> of getting help. Without the bug (and with such little information as
> the previous e-mail) it's hard to estimate the extent and nature of the
> bug.
> 
> I've anyway gone and prepared a patch to drop the RCU sync completely
> from shrinker phase, as discussed originally with Chris.

Is that a patch that will be suitable for 4.11-stable?  Please do post
it here.  I had not experienced this i915-induced hang at all when
Andrea first mentioned it, nor even on 4.11-rc8; but now with 4.11
final I can get it fairly easily (I haven't tried Andrea's fix yet).

Hugh


Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-05-05 Thread Hugh Dickins
On Fri, 5 May 2017, Joonas Lahtinen wrote:
> On ma, 2017-05-01 at 11:05 +0900, J. R. Okajima wrote:
> > Thanx for the reply.
> > 
> > Andrea Arcangeli:
> > > 
> > > Yes I already reported this, my original fix was way more efficient
> > > (and also safer considering the above) than what landed upstream. My
> > > feedback was ignored though.
> > > 
> > > https://lists.freedesktop.org/archives/intel-gfx/2017-April/125414.html
> > 
> > I see.
> > Actually on my test system for v4.11-rc8, kthreadd, kworker, kswapd and
> > others all stopped working due to the synchronize_rcu_expedited call
> > from i915_gem_shrinker_count. It is definitly a show stopper for me as
> > an i915 user.
> 
> Filing a bug in freedesktop.org with all the details is the fastest way
> of getting help. Without the bug (and with such little information as
> the previous e-mail) it's hard to estimate the extent and nature of the
> bug.
> 
> I've anyway gone and prepared a patch to drop the RCU sync completely
> from shrinker phase, as discussed originally with Chris.

Is that a patch that will be suitable for 4.11-stable?  Please do post
it here.  I had not experienced this i915-induced hang at all when
Andrea first mentioned it, nor even on 4.11-rc8; but now with 4.11
final I can get it fairly easily (I haven't tried Andrea's fix yet).

Hugh


Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-05-05 Thread Joonas Lahtinen
On ma, 2017-05-01 at 11:05 +0900, J. R. Okajima wrote:
> Thanx for the reply.
> 
> Andrea Arcangeli:
> > 
> > Yes I already reported this, my original fix was way more efficient
> > (and also safer considering the above) than what landed upstream. My
> > feedback was ignored though.
> > 
> > https://lists.freedesktop.org/archives/intel-gfx/2017-April/125414.html
> 
> I see.
> Actually on my test system for v4.11-rc8, kthreadd, kworker, kswapd and
> others all stopped working due to the synchronize_rcu_expedited call
> from i915_gem_shrinker_count. It is definitly a show stopper for me as
> an i915 user.

Filing a bug in freedesktop.org with all the details is the fastest way
of getting help. Without the bug (and with such little information as
the previous e-mail) it's hard to estimate the extent and nature of the
bug.

I've anyway gone and prepared a patch to drop the RCU sync completely
from shrinker phase, as discussed originally with Chris.

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation


Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-05-05 Thread Joonas Lahtinen
On ma, 2017-05-01 at 11:05 +0900, J. R. Okajima wrote:
> Thanx for the reply.
> 
> Andrea Arcangeli:
> > 
> > Yes I already reported this, my original fix was way more efficient
> > (and also safer considering the above) than what landed upstream. My
> > feedback was ignored though.
> > 
> > https://lists.freedesktop.org/archives/intel-gfx/2017-April/125414.html
> 
> I see.
> Actually on my test system for v4.11-rc8, kthreadd, kworker, kswapd and
> others all stopped working due to the synchronize_rcu_expedited call
> from i915_gem_shrinker_count. It is definitly a show stopper for me as
> an i915 user.

Filing a bug in freedesktop.org with all the details is the fastest way
of getting help. Without the bug (and with such little information as
the previous e-mail) it's hard to estimate the extent and nature of the
bug.

I've anyway gone and prepared a patch to drop the RCU sync completely
from shrinker phase, as discussed originally with Chris.

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation


Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-04-30 Thread J. R. Okajima
Thanx for the reply.

Andrea Arcangeli:
> Yes I already reported this, my original fix was way more efficient
> (and also safer considering the above) than what landed upstream. My
> feedback was ignored though.
>
> https://lists.freedesktop.org/archives/intel-gfx/2017-April/125414.html

I see.
Actually on my test system for v4.11-rc8, kthreadd, kworker, kswapd and
others all stopped working due to the synchronize_rcu_expedited call
from i915_gem_shrinker_count. It is definitly a show stopper for me as
an i915 user.

It was a few weeks ago when you posted. It is a pity the fix was not
merged before v4.11 comes out. I know v4.11 will appear soon. So I'd ask
i915 developers, would you test Andrea Arcangeli's fix and release it as
v4.11.1 as soon as possible?


J. R. Okajima


Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-04-30 Thread J. R. Okajima
Thanx for the reply.

Andrea Arcangeli:
> Yes I already reported this, my original fix was way more efficient
> (and also safer considering the above) than what landed upstream. My
> feedback was ignored though.
>
> https://lists.freedesktop.org/archives/intel-gfx/2017-April/125414.html

I see.
Actually on my test system for v4.11-rc8, kthreadd, kworker, kswapd and
others all stopped working due to the synchronize_rcu_expedited call
from i915_gem_shrinker_count. It is definitly a show stopper for me as
an i915 user.

It was a few weeks ago when you posted. It is a pity the fix was not
merged before v4.11 comes out. I know v4.11 will appear soon. So I'd ask
i915 developers, would you test Andrea Arcangeli's fix and release it as
v4.11.1 as soon as possible?


J. R. Okajima


Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-04-30 Thread Andrea Arcangeli
On Sun, Apr 30, 2017 at 03:07:58PM +0900, J. R. Okajima wrote:
> Hello,
> 
> Since v4.11-rc7 I can see the workqueue stops on my development/test system.
> Git-bisecting tells me the suspicious commit is
>   c053b5a 2017-04-11 drm/i915: Don't call synchronize_rcu_expedited under 
> struct_mutex
> 
> I am not sure whether this is the real cause or not of my problem, but I
> have a question.
> By the commit, the shrinker handlers ->scan_objects() and
> ->count_objects() both calls synchronize_rcu_expedited()
> unconditionally. Is it a legal RCU bahavour?

It's actually not legal because the workqueue RCU uses is not reclaim
safe, simply lockdep is unable to notice it because RCU won't use
flush_workqueue to wait for completion but it waits a wakeup from the
workqueue instead (and lockdep can't possibly notice that).

To fix it and allow both RCU and flush_workqueue (both needed to wait
the memory to be freed), i915 and RCU must both start using their own
private workqueue and not share the system-wide one and then set
WQ_MEM_RECLAIM on their private workqueue. (and of course they should
only call it when they're not recursing on the struct mutex)

The alternative is to drop it all and behave like in the mutex
recursion case. However reclaim cannot possibly throttle on the memory
freeing externally unless RCU is changed to stop using the system
workqueue so the idea that the throttling can be offloaded to the
reclaim code doesn't move the needle in terms of being able to
throttle. Perhaps no throttling is necessary at all and we can just go
inaccurate and it'll work fine though.

> I know dev->struct_mutex is unlocked now, but before the commit, these
> two handlers were not calling synchronize_rcu_expedited().

Yes I already reported this, my original fix was way more efficient
(and also safer considering the above) than what landed upstream. My
feedback was ignored though.

https://lists.freedesktop.org/archives/intel-gfx/2017-April/125414.html

Thanks,
Andrea


Re: Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-04-30 Thread Andrea Arcangeli
On Sun, Apr 30, 2017 at 03:07:58PM +0900, J. R. Okajima wrote:
> Hello,
> 
> Since v4.11-rc7 I can see the workqueue stops on my development/test system.
> Git-bisecting tells me the suspicious commit is
>   c053b5a 2017-04-11 drm/i915: Don't call synchronize_rcu_expedited under 
> struct_mutex
> 
> I am not sure whether this is the real cause or not of my problem, but I
> have a question.
> By the commit, the shrinker handlers ->scan_objects() and
> ->count_objects() both calls synchronize_rcu_expedited()
> unconditionally. Is it a legal RCU bahavour?

It's actually not legal because the workqueue RCU uses is not reclaim
safe, simply lockdep is unable to notice it because RCU won't use
flush_workqueue to wait for completion but it waits a wakeup from the
workqueue instead (and lockdep can't possibly notice that).

To fix it and allow both RCU and flush_workqueue (both needed to wait
the memory to be freed), i915 and RCU must both start using their own
private workqueue and not share the system-wide one and then set
WQ_MEM_RECLAIM on their private workqueue. (and of course they should
only call it when they're not recursing on the struct mutex)

The alternative is to drop it all and behave like in the mutex
recursion case. However reclaim cannot possibly throttle on the memory
freeing externally unless RCU is changed to stop using the system
workqueue so the idea that the throttling can be offloaded to the
reclaim code doesn't move the needle in terms of being able to
throttle. Perhaps no throttling is necessary at all and we can just go
inaccurate and it'll work fine though.

> I know dev->struct_mutex is unlocked now, but before the commit, these
> two handlers were not calling synchronize_rcu_expedited().

Yes I already reported this, my original fix was way more efficient
(and also safer considering the above) than what landed upstream. My
feedback was ignored though.

https://lists.freedesktop.org/archives/intel-gfx/2017-April/125414.html

Thanks,
Andrea


Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-04-30 Thread J. R. Okajima
Hello,

Since v4.11-rc7 I can see the workqueue stops on my development/test system.
Git-bisecting tells me the suspicious commit is
c053b5a 2017-04-11 drm/i915: Don't call synchronize_rcu_expedited under 
struct_mutex

I am not sure whether this is the real cause or not of my problem, but I
have a question.
By the commit, the shrinker handlers ->scan_objects() and
->count_objects() both calls synchronize_rcu_expedited()
unconditionally. Is it a legal RCU bahavour?

I know dev->struct_mutex is unlocked now, but before the commit, these
two handlers were not calling synchronize_rcu_expedited().


J. R. Okajima


Q. drm/i915 shrinker, synchronize_rcu_expedited() from handlers

2017-04-30 Thread J. R. Okajima
Hello,

Since v4.11-rc7 I can see the workqueue stops on my development/test system.
Git-bisecting tells me the suspicious commit is
c053b5a 2017-04-11 drm/i915: Don't call synchronize_rcu_expedited under 
struct_mutex

I am not sure whether this is the real cause or not of my problem, but I
have a question.
By the commit, the shrinker handlers ->scan_objects() and
->count_objects() both calls synchronize_rcu_expedited()
unconditionally. Is it a legal RCU bahavour?

I know dev->struct_mutex is unlocked now, but before the commit, these
two handlers were not calling synchronize_rcu_expedited().


J. R. Okajima