Re: [Intel-gfx] [PATCH 4/6] drm/i915: Set our shrinker->batch to 4096 (~16MiB)

2017-09-27 Thread Chris Wilson
Quoting Joonas Lahtinen (2017-09-27 09:38:20)
> On Tue, 2017-09-26 at 16:02 +0100, Chris Wilson wrote:
> > Quoting Joonas Lahtinen (2017-09-20 14:28:40)
> > > On Wed, 2017-08-16 at 14:55 +0100, Chris Wilson wrote:
> > > > Quoting Joonas Lahtinen (2017-08-16 14:39:00)
> > > > > On Sat, 2017-08-12 at 12:51 +0100, Chris Wilson wrote:
> > > > > > Prefer to defer activating our GEM shrinker until we have a few
> > > > > > megabytes to free; or we have accumulated sufficient mempressure by
> > > > > > deferring the reclaim to force a shrink. The intent is that because 
> > > > > > our
> > > > > > objects may typically be large, we are too effective at shrinking 
> > > > > > and
> > > > > > are not rewarded for freeing more pages than the batch. It will also
> > > > > > defer the initial shrinking to hopefully put it at a lower priority 
> > > > > > than
> > > > > > say the buffer cache (although it will balance out over a number of
> > > > > > reclaims, with GEM being more bursty).
> > > > > > 
> > > > > > Signed-off-by: Chris Wilson 
> > > > > > ---
> > > > > >  drivers/gpu/drm/i915/i915_gem_shrinker.c | 1 +
> > > > > >  1 file changed, 1 insertion(+)
> > > > > > 
> > > > > > diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c 
> > > > > > b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> > > > > > index 5b8bc0e4f336..8bb17e9a52de 100644
> > > > > > --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
> > > > > > +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> > > > > > @@ -461,6 +461,7 @@ void i915_gem_shrinker_init(struct 
> > > > > > drm_i915_private *dev_priv)
> > > > > >   dev_priv->mm.shrinker.scan_objects = i915_gem_shrinker_scan;
> > > > > >   dev_priv->mm.shrinker.count_objects = i915_gem_shrinker_count;
> > > > > >   dev_priv->mm.shrinker.seeks = DEFAULT_SEEKS;
> > > > > > + dev_priv->mm.shrinker.batch = 4096;
> > > > > 
> > > > > Did you try how this alone effects the runtime of two consequtive
> > > > > gem.testlist runs? Is there some specific test/usecase that benefits
> > > > > from this. We'd be the first one to set this, md/raid5.c sets it to 
> > > > > 128
> > > > > which is the default (0).
> > > > 
> > > > My testing was trying to play a game that was hitting swap on an old
> > > > hdd. So not very quantifiable, and vmscan is very unintuitive. 
> > > > 
> > > > Note also that we are special in that we don't report objects but pages.
> > > > Not that it makes any difference, upon reclaim every slab is basically
> > > > asked to give up some %% of what it reports, with some hysteresis thrown
> > > > in on top.
> > > > 
> > > > The only way we can do anything here is to throw it at lots of systems
> > > > and see how that helps. My gut feeling says that the batch size should
> > > > be approximately the typical object size in the freeable list, to try to
> > > > reduce the amount of inefficient work. Now, the value is read before
> > > > scan->count is called, but we can always improve the estimate for the
> > > > next pass.
> > > 
> > > For documentation purposes, from IRC, this is;
> > > 
> > > Reviewed-by: Joonas Lahtinen 
> > 
> > And for reference, I took a stab at measuring vmpressure with gem_syslatency
> > https://patchwork.freedesktop.org/patch/178777/
> 
> And, did we get any results? :)

Elsewhere I reported that the only thing that made a dramatic bit of
difference was mlock_vma_pages(). That page completely erradicates the
system stalls under vmpressure with memory full of i915 objects and fs
buffer cache.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 4/6] drm/i915: Set our shrinker->batch to 4096 (~16MiB)

2017-09-27 Thread Joonas Lahtinen
On Tue, 2017-09-26 at 16:02 +0100, Chris Wilson wrote:
> Quoting Joonas Lahtinen (2017-09-20 14:28:40)
> > On Wed, 2017-08-16 at 14:55 +0100, Chris Wilson wrote:
> > > Quoting Joonas Lahtinen (2017-08-16 14:39:00)
> > > > On Sat, 2017-08-12 at 12:51 +0100, Chris Wilson wrote:
> > > > > Prefer to defer activating our GEM shrinker until we have a few
> > > > > megabytes to free; or we have accumulated sufficient mempressure by
> > > > > deferring the reclaim to force a shrink. The intent is that because 
> > > > > our
> > > > > objects may typically be large, we are too effective at shrinking and
> > > > > are not rewarded for freeing more pages than the batch. It will also
> > > > > defer the initial shrinking to hopefully put it at a lower priority 
> > > > > than
> > > > > say the buffer cache (although it will balance out over a number of
> > > > > reclaims, with GEM being more bursty).
> > > > > 
> > > > > Signed-off-by: Chris Wilson 
> > > > > ---
> > > > >  drivers/gpu/drm/i915/i915_gem_shrinker.c | 1 +
> > > > >  1 file changed, 1 insertion(+)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c 
> > > > > b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> > > > > index 5b8bc0e4f336..8bb17e9a52de 100644
> > > > > --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
> > > > > +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> > > > > @@ -461,6 +461,7 @@ void i915_gem_shrinker_init(struct 
> > > > > drm_i915_private *dev_priv)
> > > > >   dev_priv->mm.shrinker.scan_objects = i915_gem_shrinker_scan;
> > > > >   dev_priv->mm.shrinker.count_objects = i915_gem_shrinker_count;
> > > > >   dev_priv->mm.shrinker.seeks = DEFAULT_SEEKS;
> > > > > + dev_priv->mm.shrinker.batch = 4096;
> > > > 
> > > > Did you try how this alone effects the runtime of two consequtive
> > > > gem.testlist runs? Is there some specific test/usecase that benefits
> > > > from this. We'd be the first one to set this, md/raid5.c sets it to 128
> > > > which is the default (0).
> > > 
> > > My testing was trying to play a game that was hitting swap on an old
> > > hdd. So not very quantifiable, and vmscan is very unintuitive. 
> > > 
> > > Note also that we are special in that we don't report objects but pages.
> > > Not that it makes any difference, upon reclaim every slab is basically
> > > asked to give up some %% of what it reports, with some hysteresis thrown
> > > in on top.
> > > 
> > > The only way we can do anything here is to throw it at lots of systems
> > > and see how that helps. My gut feeling says that the batch size should
> > > be approximately the typical object size in the freeable list, to try to
> > > reduce the amount of inefficient work. Now, the value is read before
> > > scan->count is called, but we can always improve the estimate for the
> > > next pass.
> > 
> > For documentation purposes, from IRC, this is;
> > 
> > Reviewed-by: Joonas Lahtinen 
> 
> And for reference, I took a stab at measuring vmpressure with gem_syslatency
> https://patchwork.freedesktop.org/patch/178777/

And, did we get any results? :)

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 4/6] drm/i915: Set our shrinker->batch to 4096 (~16MiB)

2017-09-26 Thread Chris Wilson
Quoting Joonas Lahtinen (2017-09-20 14:28:40)
> On Wed, 2017-08-16 at 14:55 +0100, Chris Wilson wrote:
> > Quoting Joonas Lahtinen (2017-08-16 14:39:00)
> > > On Sat, 2017-08-12 at 12:51 +0100, Chris Wilson wrote:
> > > > Prefer to defer activating our GEM shrinker until we have a few
> > > > megabytes to free; or we have accumulated sufficient mempressure by
> > > > deferring the reclaim to force a shrink. The intent is that because our
> > > > objects may typically be large, we are too effective at shrinking and
> > > > are not rewarded for freeing more pages than the batch. It will also
> > > > defer the initial shrinking to hopefully put it at a lower priority than
> > > > say the buffer cache (although it will balance out over a number of
> > > > reclaims, with GEM being more bursty).
> > > > 
> > > > Signed-off-by: Chris Wilson 
> > > > ---
> > > >  drivers/gpu/drm/i915/i915_gem_shrinker.c | 1 +
> > > >  1 file changed, 1 insertion(+)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c 
> > > > b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> > > > index 5b8bc0e4f336..8bb17e9a52de 100644
> > > > --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
> > > > +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> > > > @@ -461,6 +461,7 @@ void i915_gem_shrinker_init(struct drm_i915_private 
> > > > *dev_priv)
> > > >   dev_priv->mm.shrinker.scan_objects = i915_gem_shrinker_scan;
> > > >   dev_priv->mm.shrinker.count_objects = i915_gem_shrinker_count;
> > > >   dev_priv->mm.shrinker.seeks = DEFAULT_SEEKS;
> > > > + dev_priv->mm.shrinker.batch = 4096;
> > > 
> > > Did you try how this alone effects the runtime of two consequtive
> > > gem.testlist runs? Is there some specific test/usecase that benefits
> > > from this. We'd be the first one to set this, md/raid5.c sets it to 128
> > > which is the default (0).
> > 
> > My testing was trying to play a game that was hitting swap on an old
> > hdd. So not very quantifiable, and vmscan is very unintuitive. 
> > 
> > Note also that we are special in that we don't report objects but pages.
> > Not that it makes any difference, upon reclaim every slab is basically
> > asked to give up some %% of what it reports, with some hysteresis thrown
> > in on top.
> > 
> > The only way we can do anything here is to throw it at lots of systems
> > and see how that helps. My gut feeling says that the batch size should
> > be approximately the typical object size in the freeable list, to try to
> > reduce the amount of inefficient work. Now, the value is read before
> > scan->count is called, but we can always improve the estimate for the
> > next pass.
> 
> For documentation purposes, from IRC, this is;
> 
> Reviewed-by: Joonas Lahtinen 

And for reference, I took a stab at measuring vmpressure with gem_syslatency
https://patchwork.freedesktop.org/patch/178777/
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH 4/6] drm/i915: Set our shrinker->batch to 4096 (~16MiB)

2017-09-20 Thread Joonas Lahtinen
On Wed, 2017-08-16 at 14:55 +0100, Chris Wilson wrote:
> Quoting Joonas Lahtinen (2017-08-16 14:39:00)
> > On Sat, 2017-08-12 at 12:51 +0100, Chris Wilson wrote:
> > > Prefer to defer activating our GEM shrinker until we have a few
> > > megabytes to free; or we have accumulated sufficient mempressure by
> > > deferring the reclaim to force a shrink. The intent is that because our
> > > objects may typically be large, we are too effective at shrinking and
> > > are not rewarded for freeing more pages than the batch. It will also
> > > defer the initial shrinking to hopefully put it at a lower priority than
> > > say the buffer cache (although it will balance out over a number of
> > > reclaims, with GEM being more bursty).
> > > 
> > > Signed-off-by: Chris Wilson 
> > > ---
> > >  drivers/gpu/drm/i915/i915_gem_shrinker.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_gem_shrinker.c 
> > > b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> > > index 5b8bc0e4f336..8bb17e9a52de 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem_shrinker.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem_shrinker.c
> > > @@ -461,6 +461,7 @@ void i915_gem_shrinker_init(struct drm_i915_private 
> > > *dev_priv)
> > >   dev_priv->mm.shrinker.scan_objects = i915_gem_shrinker_scan;
> > >   dev_priv->mm.shrinker.count_objects = i915_gem_shrinker_count;
> > >   dev_priv->mm.shrinker.seeks = DEFAULT_SEEKS;
> > > + dev_priv->mm.shrinker.batch = 4096;
> > 
> > Did you try how this alone effects the runtime of two consequtive
> > gem.testlist runs? Is there some specific test/usecase that benefits
> > from this. We'd be the first one to set this, md/raid5.c sets it to 128
> > which is the default (0).
> 
> My testing was trying to play a game that was hitting swap on an old
> hdd. So not very quantifiable, and vmscan is very unintuitive. 
> 
> Note also that we are special in that we don't report objects but pages.
> Not that it makes any difference, upon reclaim every slab is basically
> asked to give up some %% of what it reports, with some hysteresis thrown
> in on top.
> 
> The only way we can do anything here is to throw it at lots of systems
> and see how that helps. My gut feeling says that the batch size should
> be approximately the typical object size in the freeable list, to try to
> reduce the amount of inefficient work. Now, the value is read before
> scan->count is called, but we can always improve the estimate for the
> next pass.

For documentation purposes, from IRC, this is;

Reviewed-by: Joonas Lahtinen 

Regards, Joonas
-- 
Joonas Lahtinen
Open Source Technology Center
Intel Corporation
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx