Re: [Intel-gfx] ✗ Fi.CI.BAT: failure for series starting with [1/4] dma-buf: add caching of sg_table

2018-08-14 Thread Daniel Vetter
On Tue, Aug 07, 2018 at 04:13:53PM +0100, Tvrtko Ursulin wrote:
> 
> On 07/08/2018 14:21, Daniel Vetter wrote:
> > On Wed, Jul 18, 2018 at 03:24:26PM +0200, Christian König wrote:
> > > Hi Daniel,
> > > 
> > > Am 18.07.2018 um 14:07 schrieb Patchwork:
> > > > == Series Details ==
> > > > 
> > > > Series: series starting with [1/4] dma-buf: add caching of sg_table
> > > > URL   : https://patchwork.freedesktop.org/series/46778/
> > > > State : failure
> > > > [SNIP]
> > > 
> > > it looks like I'm a step further understanding the problems which come 
> > > with
> > > this change.
> > > 
> > > I've more or less audited all use cases and think that only i915 is left
> > > with the following lock inversion: 
> > > https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9705/fi-cfl-8700k/igt@gem_mmap_...@basic-small-bo-tiledx.html
> > > 
> > > Now my question is what is >mm.lock used for and why do you guys call
> > > dma_buf_map_attachment() while holding it?
> > 
> > obj->mm.lock is the lock to protect getting at the backing storage.
> > i915_gem_object_get_pages and _put_pages are the relevant functions.
> > 
> > Existing paths want to pin the backing storage while holding the
> > reservation lock. And your new path needs to do the inverse since
> > dma_buf_map_attachment now also requires the reservation lock. And that is
> > obviously called from within the dma-buf importer version of get_pages.
> > 
> > I think there's 2 solutions:
> > 
> > - Merge obj->mm.lock and the reservation lock. Probably the cleaner
> >solution, but likely more work.
> > 
> > - Make sure the obj->mm.lock always nests within the reservation lock, and
> >grab the reservation lock anywhere it's not yet grabbed. Then you can
> >use the dma_buf_map_attachment_locked variant in
> >i915_gem_object_get_pages_dmabuf to avoid the locking inversion. This
> >would essentially make the obj->mm.lock fully redundant.
> > 
> > Either way is going to be quite a bit of work. I expect that you need to
> > replace all the cases of dma_buf_map_attachment in i915 with
> > dma_buf_map_attachment_locked, and adjust the entire callchain to the new
> > locking scheme.
> > 
> > The real trouble here imo is that i915 CI is just the canary, I expect a
> > bunch of other drivers will also look at an inverted locking hierarchy if
> > dma_buf_map_attachment needs the reservation lock. And there's no
> > convenient CI for them, and code audit won't cut it really (at least I'm
> > too stupid to keep the locking hierarchy of an entire driver in my head).
> 
> We chatted about this on #intel-gfx and concluded that either solution
> derives to replacing the obj->mm.lock with the reservation lock. And that is
> problematic for i915, both from the reason of a general direction towards
> more fine-grained locking, and also issue that reservation lock needs to be
> avoided under the shrinker path (we lock obj->mm.lock when dropping the
> backing store there).

So the way this works for other drivers (well atm there's only ttm ones)
is that they try-lock the reservation lock when trying to evict stuff from
the shrinker. Would be good to elaborate on why we can't use that
approach ..

> I proposed that maybe we could re-jig how we use obj->mm.lock a bit, to
> ensure backing store vfunc (get_pages) is not called under it (although I
> haven't thought it fully through it may be possible without too significant
> drawbacks), but Chris also has some patches which may work around this in a
> different way. So I'll wait to see those first.

.. or is that Chris' work?

Honest question since I'm entirely out of the loop here on the i915 side.

> On whether or not reservation lock is the right lock to use from dma-buf for
> this purpose I'll leave other guys to comment - I am not fully into the
> details of dma-buf design.

If we want to do dynamic cross-device buffer management, which is
Christian's goal here, then we must somehow harmonize the locking schemes
used for buffer management, at least for backing storage handling. That's
the entire challenge of this undertaking, and the big trouble is that we
don't have a surplus on people who understand more than their own drivers
buffer management code and locking scheme. That's the information I'm
trying to help extract from all involved parties here.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] ✗ Fi.CI.BAT: failure for series starting with [1/4] dma-buf: add caching of sg_table

2018-08-07 Thread Tvrtko Ursulin


On 07/08/2018 14:21, Daniel Vetter wrote:

On Wed, Jul 18, 2018 at 03:24:26PM +0200, Christian König wrote:

Hi Daniel,

Am 18.07.2018 um 14:07 schrieb Patchwork:

== Series Details ==

Series: series starting with [1/4] dma-buf: add caching of sg_table
URL   : https://patchwork.freedesktop.org/series/46778/
State : failure
[SNIP]


it looks like I'm a step further understanding the problems which come with
this change.

I've more or less audited all use cases and think that only i915 is left
with the following lock inversion: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9705/fi-cfl-8700k/igt@gem_mmap_...@basic-small-bo-tiledx.html

Now my question is what is >mm.lock used for and why do you guys call
dma_buf_map_attachment() while holding it?


obj->mm.lock is the lock to protect getting at the backing storage.
i915_gem_object_get_pages and _put_pages are the relevant functions.

Existing paths want to pin the backing storage while holding the
reservation lock. And your new path needs to do the inverse since
dma_buf_map_attachment now also requires the reservation lock. And that is
obviously called from within the dma-buf importer version of get_pages.

I think there's 2 solutions:

- Merge obj->mm.lock and the reservation lock. Probably the cleaner
   solution, but likely more work.

- Make sure the obj->mm.lock always nests within the reservation lock, and
   grab the reservation lock anywhere it's not yet grabbed. Then you can
   use the dma_buf_map_attachment_locked variant in
   i915_gem_object_get_pages_dmabuf to avoid the locking inversion. This
   would essentially make the obj->mm.lock fully redundant.

Either way is going to be quite a bit of work. I expect that you need to
replace all the cases of dma_buf_map_attachment in i915 with
dma_buf_map_attachment_locked, and adjust the entire callchain to the new
locking scheme.

The real trouble here imo is that i915 CI is just the canary, I expect a
bunch of other drivers will also look at an inverted locking hierarchy if
dma_buf_map_attachment needs the reservation lock. And there's no
convenient CI for them, and code audit won't cut it really (at least I'm
too stupid to keep the locking hierarchy of an entire driver in my head).


We chatted about this on #intel-gfx and concluded that either solution 
derives to replacing the obj->mm.lock with the reservation lock. And 
that is problematic for i915, both from the reason of a general 
direction towards more fine-grained locking, and also issue that 
reservation lock needs to be avoided under the shrinker path (we lock 
obj->mm.lock when dropping the backing store there).


I proposed that maybe we could re-jig how we use obj->mm.lock a bit, to 
ensure backing store vfunc (get_pages) is not called under it (although 
I haven't thought it fully through it may be possible without too 
significant drawbacks), but Chris also has some patches which may work 
around this in a different way. So I'll wait to see those first.


On whether or not reservation lock is the right lock to use from dma-buf 
for this purpose I'll leave other guys to comment - I am not fully into 
the details of dma-buf design.


Regards,

Tvrtko
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] ✗ Fi.CI.BAT: failure for series starting with [1/4] dma-buf: add caching of sg_table

2018-08-07 Thread Daniel Vetter
On Wed, Jul 18, 2018 at 03:24:26PM +0200, Christian König wrote:
> Hi Daniel,
> 
> Am 18.07.2018 um 14:07 schrieb Patchwork:
> > == Series Details ==
> > 
> > Series: series starting with [1/4] dma-buf: add caching of sg_table
> > URL   : https://patchwork.freedesktop.org/series/46778/
> > State : failure
> > [SNIP]
> 
> it looks like I'm a step further understanding the problems which come with
> this change.
> 
> I've more or less audited all use cases and think that only i915 is left
> with the following lock inversion: 
> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9705/fi-cfl-8700k/igt@gem_mmap_...@basic-small-bo-tiledx.html
> 
> Now my question is what is >mm.lock used for and why do you guys call
> dma_buf_map_attachment() while holding it?

obj->mm.lock is the lock to protect getting at the backing storage.
i915_gem_object_get_pages and _put_pages are the relevant functions.

Existing paths want to pin the backing storage while holding the
reservation lock. And your new path needs to do the inverse since
dma_buf_map_attachment now also requires the reservation lock. And that is
obviously called from within the dma-buf importer version of get_pages.

I think there's 2 solutions:

- Merge obj->mm.lock and the reservation lock. Probably the cleaner
  solution, but likely more work.

- Make sure the obj->mm.lock always nests within the reservation lock, and
  grab the reservation lock anywhere it's not yet grabbed. Then you can
  use the dma_buf_map_attachment_locked variant in
  i915_gem_object_get_pages_dmabuf to avoid the locking inversion. This
  would essentially make the obj->mm.lock fully redundant.

Either way is going to be quite a bit of work. I expect that you need to
replace all the cases of dma_buf_map_attachment in i915 with
dma_buf_map_attachment_locked, and adjust the entire callchain to the new
locking scheme.

The real trouble here imo is that i915 CI is just the canary, I expect a
bunch of other drivers will also look at an inverted locking hierarchy if
dma_buf_map_attachment needs the reservation lock. And there's no
convenient CI for them, and code audit won't cut it really (at least I'm
too stupid to keep the locking hierarchy of an entire driver in my head).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] ✗ Fi.CI.BAT: failure for series starting with [1/4] dma-buf: add caching of sg_table

2018-07-18 Thread Christian König

Hi Daniel,

Am 18.07.2018 um 14:07 schrieb Patchwork:

== Series Details ==

Series: series starting with [1/4] dma-buf: add caching of sg_table
URL   : https://patchwork.freedesktop.org/series/46778/
State : failure
[SNIP]


it looks like I'm a step further understanding the problems which come 
with this change.


I've more or less audited all use cases and think that only i915 is left 
with the following lock inversion: 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_9705/fi-cfl-8700k/igt@gem_mmap_...@basic-small-bo-tiledx.html


Now my question is what is >mm.lock used for and why do you guys 
call dma_buf_map_attachment() while holding it?


Thanks in advance,
Christian.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


[Intel-gfx] ✗ Fi.CI.BAT: failure for series starting with [1/4] dma-buf: add caching of sg_table

2018-07-18 Thread Patchwork
== Series Details ==

Series: series starting with [1/4] dma-buf: add caching of sg_table
URL   : https://patchwork.freedesktop.org/series/46778/
State : failure

== Summary ==

= CI Bug Log - changes from CI_DRM_4503 -> Patchwork_9705 =

== Summary - FAILURE ==

  Serious unknown changes coming with Patchwork_9705 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_9705, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: 
https://patchwork.freedesktop.org/api/1.0/series/46778/revisions/1/mbox/

== Possible new issues ==

  Here are the unknown changes that may have been introduced in Patchwork_9705:

  === IGT changes ===

 Possible regressions 

igt@drv_selftest@live_hangcheck:
  fi-cfl-guc: PASS -> DMESG-FAIL

igt@gem_mmap_gtt@basic-small-bo-tiledx:
  {fi-cfl-8109u}: PASS -> DMESG-WARN
  fi-glk-dsi: PASS -> DMESG-WARN
  fi-hsw-peppy:   PASS -> DMESG-WARN
  fi-bsw-n3050:   PASS -> DMESG-WARN
  fi-bxt-j4205:   PASS -> DMESG-WARN
  fi-skl-6700hq:  PASS -> DMESG-WARN
  fi-cfl-guc: PASS -> DMESG-WARN
  fi-cfl-s3:  PASS -> DMESG-WARN
  fi-bxt-dsi: PASS -> DMESG-WARN
  fi-ivb-3770:PASS -> DMESG-WARN
  fi-skl-6700k2:  PASS -> DMESG-WARN
  fi-whl-u:   PASS -> DMESG-WARN
  fi-cfl-8700k:   PASS -> DMESG-WARN
  fi-byt-j1900:   PASS -> DMESG-WARN
  fi-hsw-4770:PASS -> DMESG-WARN
  fi-kbl-7560u:   PASS -> DMESG-WARN
  fi-skl-6600u:   PASS -> DMESG-WARN
  fi-kbl-guc: PASS -> DMESG-WARN
  {fi-kbl-8809g}: NOTRUN -> DMESG-WARN
  fi-kbl-7567u:   PASS -> DMESG-WARN
  fi-skl-guc: PASS -> DMESG-WARN
  fi-glk-j4005:   PASS -> DMESG-WARN
  fi-bdw-5557u:   PASS -> DMESG-WARN
  fi-bwr-2160:PASS -> DMESG-WARN
  fi-kbl-r:   PASS -> DMESG-WARN
  fi-blb-e6850:   PASS -> DMESG-WARN
  fi-kbl-x1275:   PASS -> DMESG-WARN
  fi-skl-gvtdvm:  PASS -> DMESG-WARN
  fi-hsw-4770r:   PASS -> DMESG-WARN
  fi-skl-6260u:   PASS -> DMESG-WARN
  fi-snb-2600:PASS -> DMESG-WARN
  fi-ilk-650: PASS -> DMESG-WARN
  fi-elk-e7500:   PASS -> DMESG-WARN
  fi-skl-6770hq:  PASS -> DMESG-WARN
  fi-byt-n2820:   PASS -> DMESG-WARN
  fi-snb-2520m:   PASS -> DMESG-WARN
  fi-pnv-d510:PASS -> DMESG-WARN
  fi-kbl-7500u:   PASS -> DMESG-WARN
  fi-ivb-3520m:   PASS -> DMESG-WARN


 Warnings 

igt@gem_exec_gttfill@basic:
  fi-pnv-d510:PASS -> SKIP


== Known issues ==

  Here are the changes found in Patchwork_9705 that come from known issues:

  === IGT changes ===

 Issues hit 

igt@drv_selftest@live_hangcheck:
  fi-kbl-guc: PASS -> DMESG-FAIL (fdo#106947)

igt@gem_exec_suspend@basic-s4-devices:
  {fi-kbl-8809g}: NOTRUN -> DMESG-WARN (fdo#107139, fdo#107222)

igt@kms_frontbuffer_tracking@basic:
  fi-hsw-peppy:   PASS -> DMESG-FAIL (fdo#102614, fdo#106103)

igt@kms_pipe_crc_basic@suspend-read-crc-pipe-c:
  fi-bxt-dsi: PASS -> INCOMPLETE (fdo#103927)

igt@prime_vgem@basic-fence-flip:
  fi-ilk-650: PASS -> FAIL (fdo#104008)


 Possible fixes 

igt@drv_selftest@live_hangcheck:
  {fi-cfl-8109u}: DMESG-FAIL (fdo#106560) -> PASS

igt@drv_selftest@live_workarounds:
  {fi-cfl-8109u}: DMESG-FAIL -> PASS

igt@kms_chamelium@dp-crc-fast:
  fi-kbl-7500u:   FAIL (fdo#103841) -> PASS +1

igt@kms_flip@basic-flip-vs-dpms:
  fi-skl-6700hq:  DMESG-WARN (fdo#105998) -> PASS

igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b:
  fi-snb-2520m:   INCOMPLETE (fdo#103713) -> PASS


  {name}: This element is suppressed. This means it is ignored when computing
  the status of the difference (SUCCESS, WARNING, or FAILURE).

  fdo#102614 https://bugs.freedesktop.org/show_bug.cgi?id=102614
  fdo#103713 https://bugs.freedesktop.org/show_bug.cgi?id=103713
  fdo#103841 https://bugs.freedesktop.org/show_bug.cgi?id=103841
  fdo#103927 https://bugs.freedesktop.org/show_bug.cgi?id=103927
  fdo#104008 https://bugs.freedesktop.org/show_bug.cgi?id=104008
  fdo#105998 https://bugs.freedesktop.org/show_bug.cgi?id=105998
  fdo#106103 https://bugs.freedesktop.org/show_bug.cgi?id=106103
  fdo#106560 https://bugs.freedesktop.org/show_bug.cgi?id=106560
  fdo#106947 https://bugs.freedesktop.org/show_bug.cgi?id=106947
  fdo#107139 https://bugs.freedesktop.org/show_bug.cgi?id=107139
  fdo#107222 https://bugs.freedesktop.org/show_bug.cgi?id=107222


== Participating hosts (46 -> 41) ==

  Additional (1): fi-kbl-8809g 
  Missing(6): fi-ilk-m540